git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Type-erase chrono-data for formatting [PR110739]

This patch reworks the formatting for the chrono types, such that they are all
formatted in terms of _ChronoData class, that includes all required fields.
Populating each required field is performed in formatter for specific type,
based on the chrono-spec used.

To facilitate above, the _ChronoSpec now includes additional _M_needed field,
that represnts the chrono data that is referenced by format spec (this value
is also configured for __defSpec). This value differs from the value of
__parts passed to _M_parse, which does include all fields that can be computed
from input (e.g. weekday_indexed can be computed for year_month_day). Later
it is used to fill _ChronoData, in particular _M_fill_* family of functions,
to determine if given field needs to be set, and thus its value needs to be
computed.

In consequence _ChronoParts enum was extended with additional values, that
allows more fine grained identification:
* _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
* _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
* _LocalDays, _WeekdayIndex are defined and in included in _Date,
* _Duration is removed, and instead _EpochUnits and _UnitSuffix are
   introduced.
Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
with additional operators that simplify uses.

In addition to fields that can be printed using chrono-spec, _ChronoData stores:
* Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
   struct tm construction, and for ISO calendar computation.
* Total seconds in wall time (_M_lseconds) - this value may be different from
   sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
   to allow future extension, like printing total minutes.
* Total seconds since epoch - due offset different from above. Again to be
   used with future extension (e.g. %s as proposed in P2945R1).
* Subseconds - count of attoseconds (10^(-18)), in addition to printing can
   be used to  compute fractional hours, minutes.
The both total seconds fields use single _TotalSeconds enumerator in
_ChronoParts, that when present in combination with _EpochUnits or _LocalDays
indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
provided/required.

To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
format_args mechanism, where the result of +d.count() (see LWG4118) is erased
into make_format_args to local __arg_store, that is later referenced by
_M_ereps (_M_ereps.get(0)).

To handle precision values, and in prepartion to allow user to configure ones,
we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
allows duration with precision to be printed using "{0:{2}}". For subseconds
the precision is handled differently depending on the representation:
* for integral reps, _M_subseconds value is used to determine fractional value,
   precision is trimmed to 18 digits;
* for floating-points, _M_ereps stores duration<Rep> initialized with only
   fractional seconds, that is later formatted with precision.
Always using _M_subseconds fields for integral duration, means that we do not
use formattter for user-defined durations that are considered to be integral
(see empty_spec.cc file change). To avoid potentially expensive computation
of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
_Subseconds are needed. In particular we remove this flag for localized ouput
in _M_parse.

Construction of the _M_ereps as described above is handled by __formatter_duration,
that is then used to format duration, hh_mm_ss and time_points specializations.
This class also handles _UnitSuffix, the _M_units_suffix field is populated
either with predefined suffix (chrono::__detail::__units_suffix) or one produced
locally.

Finally, formatters for types listed below contains type specific logic:
* hh_mm_ss - we do not compute total duration and seconds, unless explicitly
   requested, as such computation may overflow;
* utc_time - for time during leap second insertion, the _M_seconds field is
   increased to 60;
* __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
   abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
   futhermore conversion from `char` to `wchar_t` for abbreviation is performed
   if needed.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::__no_timezone_available):
Removed, replaced with separate throws in formatter for
__local_time_fmt
(__format::_ChronoParts): Defined additional enumertors and
declared as enum class.
(__format::operator&(_ChronoParts, _ChronoParts))
(__format::operator&=(_ChronoParts&, _ChronoParts))
(__format::operator-(_ChronoParts, _ChronoParts))
(__format::operator-=(_ChronoParts&, _ChronoParts))
(__format::operator==(_ChronoParts, decltype(nullptr)))
(_ChronoSpec::_M_time_only, _ChronoSpec::_M_floating_point_rep)
(_ChronoSpec::_M_custom_rep, _ChronoSpec::_M_needed)
(_ChronoSpec::_M_needs, __format::_ChronoData): Define.
(__format::__formatter_chrono): Redefine to accept _ChronoData.
(__formatter_chrono::_M_format_to_ostream): Moved to
__formatter_duration.
(__format::__formatter_duration): Define.
(__formatter_chrono_info::format): Pass value-constructed
_ChronoData.
(std::formatter<chrono::day, _CharT>)
(std::formatter<chrono::month, _CharT>)
(std::formatter<chrono::year, _CharT>)
(std::formatter<chrono::weekday, _CharT>)
(std::formatter<chrono::weekday_indexed, _CharT>)
(std::formatter<chrono::weekday_last, _CharT>)
(std::formatter<chrono::month_day, _CharT>)
(std::formatter<chrono::month_day_last, _CharT>)
(std::formatter<chrono::month_weekday, _CharT>)
(std::formatter<chrono::month_weekday_indexed, _CharT>)
(std::formatter<chrono::month_weekday_last, _CharT>)
(std::formatter<chrono::year_month, _CharT>)
(std::formatter<chrono::year_month_day, _CharT>)
(std::formatter<chrono::year_month_day_last, _CharT>)
(std::formatter<chrono::year_month_weekday, _CharT>)
(std::formatter<chrono::year_month_weekday_indexed, _CharT>)
(std::formatter<chrono::year_month_weekday_last, _CharT>):
Construct _ChronoData in format, and configure _M_needed in
_ChronoSpec.
(std::formatter<chrono::duration<_Rep, _Period>, _CharT>)
(std::formatter<chrono::hh_mm_ss<_Duration>, _CharT>)
(std::formatter<chrono::sys_time<_Duration>, _CharT>)
(std::formatter<chrono::utc_time<_Duration>, _CharT>)
(std::formatter<chrono::tai_time<_Duration>, _CharT>)
(std::formatter<chrono::gps_time<_Duration>, _CharT>)
(std::formatter<chrono::file_time<_Duration>, _CharT>)
(std::formatter<chrono::local_time<_Duration>, _CharT>)
(std::formatter<chrono::_detail::__local_time_fmt<_Duration>, _CharT>):
Reworked in terms of __formatter_duration and _ChronoData.
(std::formatter<chrono::_detail::__utc_leap_second<_Duration>, _CharT>):
Removed.
(_Parser<_Duration>::operator()): Adjusted for _ChronoParts
being enum class.
* include/std/chrono (__detail::__utc_leap_second): Removed,
replaced with simply bumping _M_seconds in _ChronoData.
* testsuite/std/time/format/empty_spec.cc: Updated %S integral
ouput.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Implement C++26 P2927R3 - Inspecting exception_ptr

The following patch attempts to implement the C++26 P2927R3 - Inspecting exception_ptr
paper (but not including P3748R0, I plan to play with it incrementally and
it will really depend on the Constexpr exceptions patch).

The function template is implemented using an out of line private method of
exception_ptr, so that P3748R0 then can use if consteval and provide a
constant evaluation variant of it.

2025-06-26 Jakub Jelinek <jakub@redhat.com>

* include/bits/version.def (exception_ptr_cast): Add.
* include/bits/version.h: Regenerate.
* libsupc++/exception: Define __glibcxx_want_exception_ptr_cast before
including bits/version.h.
* libsupc++/exception_ptr.h (std::exception_ptr_cast): Define.
(std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Declare.
* libsupc++/eh_ptr.cc
(std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Define.
* src/c++23/std.cc.in (std::exception_ptr_cast): Export.
* config/abi/pre/gnu.ver: Export
_ZNKSt15__exception_ptr13exception_ptr21_M_exception_ptr_castERKSt9type_info
at CXXABI_1.3.17.
* testsuite/util/testsuite_abi.cc (check_version): Allow CXXABI_1.3.17.
* testsuite/18_support/exception_ptr/exception_ptr_cast.cc: New test.

c++, libstdc++: Implement C++26 P2830R10 - Constexpr Type Ordering

The following patch attempts to implement the C++26 P2830R10 - Constexpr Type
Ordering paper, with a minor change that std::type_order<T, U> class template
doesn't derive from integer_constant, because std::strong_ordering is not
a structural type (except in MSVC), so instead it is just a class template
with static constexpr strong_ordering value member and also value_type,
type and 2 operators.

The paper mostly talks about using something other than mangled names for
the ordering, but given that the mangler is part of the GCC C++ FE, using
the mangler seems to be the best ordering choice to me.

2025-06-26 Jakub Jelinek <jakub@redhat.com>

gcc/cp/
* cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering.
(TYPE_ORDER): New.
* method.cc (type_order_value): Define.
* cp-tree.h (type_order_value): Declare.
* semantics.cc (trait_expr_value): Use gcc_unreachable also
for CPTK_TYPE_ORDER, adjust comment.
(finish_trait_expr): Handle CPTK_TYPE_ORDER.
* constraint.cc (diagnose_trait_expr): Likewise.
gcc/testsuite/
* g++.dg/cpp26/type-order1.C: New test.
* g++.dg/cpp26/type-order2.C: New test.
* g++.dg/cpp26/type-order3.C: New test.
libstdc++-v3/
* include/bits/version.def (type_order): New.
* include/bits/version.h: Regenerate.
* libsupc++/compare: Define __glibcxx_want_type_order before
including bits/version.h.
(std::type_order, std::type_order_v): New trait and template variable.
* src/c++23/std.cc.in (std::type_order, std::type_order_v): Export.
* testsuite/18_support/comparisons/type_order/1.cc: New test.

i386: Introduce crc_rev<mode>si4 expanders [PR120719]

Introduce crc_rev<mode>si4 expanders to generate CRC32 instruction when using
__builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32.

PR target/120719

gcc/ChangeLog:

* config/i386/i386.md (crc_rev<SWI124:mode>si4): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/i386/crc-builtin-crc32.c: New test.

RISC-V: Fix build issue

Apparently I forgot to squash this fix into the previous commit before I
push...

gcc/ChangeLog:

* config/riscv/riscv.md: Fix build issue.

lto-ltrans-cache: Remove unused private member

When building GCC with clang, it warns that the private member suffix
in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used
which indeed looks like it is the case. This patch therefore removes
it along with its initialization in the constructor.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* lto-ltrans-cache.h (class ltrans_file_cache): Remove member prefix.
* lto-ltrans-cache.cc (ltrans_file_cache::ltrans_file_cache): Do
not initialize member prefix.

RISC-V: Add comment and reorder the the include files in riscv.md [NFC]

This patch adds a comment to the riscv.md file to clarify the purpose of
the file and reorders the include files for better organization.

gcc/ChangeLog:

* config/riscv/riscv.md: Add comment and reorder include
files.

tree-vect-stmts.cc: Remove an unused shadowed variable

When compiling tree-vect-stmts.cc with clang, it emits a warning:

  gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' [-Wunused-variable]

And indeed, there are two mode_iter local variables in function
supportable_indirect_convert_operation and the first one is not used
at all.  This patch removes it.

gcc/ChangeLog:

2025-06-24  Martin Jambor  <mjambor@suse.cz>

* tree-vect-stmts.cc (supportable_indirect_convert_operation):
Remove an unused shadowed variable.

Silence a clang warning in tree-vect-slp.cc about an unused variable

Since r15-4695-gd17e672ce82e69 (Richard Biener: Assert finished
vectorizer pattern COND_EXPR transition), the static const array
cond_expr_maps is unused and when GCC is compiled with clang, it warns
about that.

This patch simply removes the variable.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* tree-vect-slp.cc (cond_expr_maps): Remove.

fortran: Avoid freeing uninitialized value

When compiling fortran/match.cc, clang emits a warning

fortran/match.cc:5301:7: warning: variable 'p' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]

which looks accurate, so this patch adds an initialization of p to
avoid the use.

gcc/fortran/ChangeLog:

2025-06-23 Martin Jambor <mjambor@suse.cz>

* match.cc (gfc_match_nullify): Initialize p to NULL;

Add testcase for afdo offlining and fix two bugs

This patch adds a testcase that offlining works and profile info is not lost.
While doing it I noticed a pasto that made the dump to be "afdo" and not
"afdo_offline" and also that not all functions are processed as the range
for does not expect new values to be put to the vector. Fixed thus.

gcc/ChangeLog:

* auto-profile.cc (function_instance::merge): Add TODO.
(autofdo_source_profile::offline_external_functions):
Do not use range for on the worklist.
* timevar.def (TV_IPA_AUTOFDO_OFFLINE): New timevar.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-crossmodule-1.c: New test.
* gcc.dg/tree-prof/afdo-crossmodule-1b.c: New test.

Fortran: Prevent creation of unused tree.

gcc/fortran/ChangeLog:

* trans.cc (gfc_allocate_using_malloc): Prevent possible memory
leak when allocation was already done.

Fortran: Fix wasting memory in coarray single mode.

gcc/fortran/ChangeLog:

* resolve.cc (resolve_fl_derived0): Do not create the token
component when not in coarray lib mode.
* trans-types.cc: Do not access the token when not in coarray
lib mode.

Fortran: Fix out of bounds access in structure constructor's clean up [PR120711]

A structure constructor's generated clean up code was using an offset
variable, which was manipulated before the clean up was run leading to
an out of bounds access.

PR fortran/120711

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_array_ctor_element): Store the value
of the offset for reuse.

gcc/testsuite/ChangeLog:

* gfortran.dg/asan/array_constructor_1.f90: New test.

Avoid some lost AFDO profiles with LTO

This patch fixes some of cases where we lose profile info because we do not
perform inlining that happened at train run before AFDO annotation is done.
This is a common problem with LTO in the case cross-module inlining happened.

I added afdo_offline pass that does two things:
1) collect set of all functions defined in current unit
2) walk all toplevel function instances.  If function instance correspond
    to a defined symbol, walk everything inlined to it.  If crossmodule
    inlining is seen, remove the inline instances and recursively look into
    inline instnaces that go back to the current unit and turn them to offline
    ones

    If function instance corresponds to external symbol, remove it but
    also look for functions inlined to it that belong to current module.

When merging profile we also need to recursively merge profiles of inlined
functions and if the inlining decisins does not match, offline the bodies.
This is somewhat fragile since recursive calls may trigger modifications of
functions currently being merged, but I hope I chased away problems with that -
will give it a second tought to see if this can be reorganized into a worklist
fashion that is more safe.

I noticed that functions may appear in the afdo data either as their
symbol name or dwarf name (since inline functions may not have known symbol
name).  There is already some logic to handle that but it is broken in the
case both names are used.

To mitigate the problem I also added logic to translate dwarf names
to symbol names in case both are used.  This prevents profile loss i.e.
in exchange2.  Here digits_2 function appears by its dwarf name (digits_2)
but also is clonned which makes it to appear by its symbol name (__*digits_2)

All profile massaging is done before early optimization so the VPT targets of
offline bodies are correct.  We still will lose profile if early inlining
fails.  I will add second pass to afdo to offline these.

Last problem is that in case we early inlined more than expected (which now
happens more often due to offlining) the profile will be lost and filled by
static profile.  Problem here is that we need to somehow scale the profile of
inline instance but I do not see how to determine invocation counts.  Will try
to look into that incrementally - perhaps we can keep some info from offlining.

There is also now a dump infrastructure that prints the proflie in a
the same format as dump_gcov tool.

autoprofiledbootstraped, regsted x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

* auto-profile.cc (name_index_set, name_index_map): New types.
(dump_afdo_loc): New function.
(dump_inline_stack): Simplify.
(function_instance::merge): Merge recursively inlined functions;
offline if necessary; collect new fnctions.
(function_instance::offline): New member function.
(function_instance::offline_if_in_set): New member function.
(function_instance::remove_external_functions): New member function.
(function_instance::dump): New member function.
(function_instance::debug): New member function.
(function_instance::dump_inline_stack): New member function.
(function_instance::find_icall_target_map): Use removed_icall_target.
(function_instance::remove_icall_target): Only mark icall target removed.
(autofdo_source_profile::offline_external_functions): New function.
(function_instance::read_function_instance): Record inlined_to pointers;
use -1 for unknown head counts.
(autofdo_source_profile::get_function_instance_by_name_index): New
function.
(autofdo_source_profile::add_function_instance): New member function.
(autofdo_source_profile::read): Do not leak memory; fix formatting.
(read_profile): Fix formatting.
(afdo_annotate_cfg): LIkewise.
(class pass_ipa_auto_profile_offline): New pass.
(make_pass_ipa_auto_profile_offline): New function.
* passes.def (pass_ipa_auto_profile_offline): Add
* tree-pass.h (make_pass_ipa_auto_profile): Declare

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.

x86: Also handle all 1s float vector constant

Since float vector constant

(const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4])

is an all 1s float vector constant, update the remove_redundant_vector
pass to replace

(insn 20 18 21 2 (set (reg:V4SF 124)
        (const_vector:V4SF [
                (const_double:SF -QNaN [-QNaN]) repeated x4
            ])) "x.cc":26:5 2426 {movv4sf_internal}
     (nil))

with

(insn 49 2 5 2 (set (reg:V16QI 135)
        (const_vector:V16QI [
                (const_int -1 [0xffffffffffffffff]) repeated x16
            ])) -1
     (nil))
...
(insn 20 18 21 2 (set (reg:V4SF 124)
        (subreg:V4SF (reg:V16QI 135) 0)) "x.cc":26:5 2426 {movv4sf_internal}
     (nil))

gcc/

PR target/120819
* config/i386/i386-features.cc (ix86_broadcast_inner): Also handle
all 1s float vector constant.

gcc/testsuite/

PR target/120819
* g++.target/i386/pr120819.C: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Handle REG_EH_REGION note in DEF_INSN

For tcpsock_test.go in libgo tests,

commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 9 07:17:07 2025 +0800

    x86: Extend the remove_redundant_vector pass

added an instruction:

(insn 501 101 102 21 (set (reg:V2DI 234)
        (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1
     (nil))

after

(insn 101 100 501 21 (set (reg:DI 111 [ _46 ])
        (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ])
        (expr_list:REG_EH_REGION (const_int 1 [0x1])
            (nil))))

which resulted in

(insn 101 100 501 21 (set (reg:DI 111 [ _46 ])
        (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ])
        (expr_list:REG_EH_REGION (const_int 1 [0x1])
            (nil))))
(insn 501 101 102 21 (set (reg:V2DI 234)
        (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1
     (nil))

and caused:

tcpsock_test.go: In function 'net.TestTCPBig..func2':
tcpsock_test.go:684:28: error: in basic block 21:
  684 |                         go func() {
      |                            ^
tcpsock_test.go:684:28: error: flow control insn inside a basic block
(insn 101 100 501 21 (set (reg:DI 111 [ _46 ])
        (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ])
        (expr_list:REG_EH_REGION (const_int 1 [0x1])
            (nil))))
during RTL pass: rrvl
tcpsock_test.go:684:28: internal compiler error: in rtl_verify_bb_insns, at cfgrtl.cc:2834

Copy the REG_EH_REGION note to the newly added instruction and split the
block after the previous instruction.

PR target/120816
* config/i386/i386-features.cc (remove_redundant_vector_load):
Handle REG_EH_REGION note in DEF_INSN.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Add preserve_none and update no_caller_saved_registers attributes

Add preserve_none attribute which is similar to no_callee_saved_registers
attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
used for integer parameter passing.  This can be used in an interpreter
to avoid saving/restoring the registers in functions which process byte
codes.  It improved the pystones benchmark by 6-7%:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15

Remove -mgeneral-regs-only restriction on no_caller_saved_registers
attribute.  Only SSE is allowed since SSE XMM register load preserves
the upper bits in YMM/ZMM register while YMM register load zeros the
upper 256 bits of ZMM register, and preserving 32 ZMM registers can
be quite expensive.

gcc/

PR target/119628
* config/i386/i386-expand.cc (ix86_expand_call): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Look up
preserve_none attribute.  Check preserve_none attribute for
interrupt attribute.  Don't check no_caller_saved_registers nor
no_callee_saved_registers conflicts here.
(ix86_set_func_type): Check no_callee_saved_registers before
checking no_caller_saved_registers attribute.
(ix86_set_current_function): Allow SSE with
no_caller_saved_registers attribute.
(ix86_handle_call_saved_registers_attribute): Check preserve_none,
no_callee_saved_registers and no_caller_saved_registers conflicts.
(ix86_gnu_attributes): Add preserve_none attribute.
* config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p):
New.
* config/i386/i386.cc
(x86_64_preserve_none_int_parameter_registers): New.
(ix86_using_red_zone): Don't use red-zone when there are no
caller-saved registers with SSE.
(ix86_type_no_callee_saved_registers_p): New.
(ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE
and call ix86_type_no_callee_saved_registers_p instead of looking
up no_callee_saved_registers attribute.
(ix86_comp_type_attributes): Call
ix86_type_no_callee_saved_registers_p instead of looking up
no_callee_saved_registers attribute.  Return 0 if preserve_none
attribute doesn't match in 64-bit mode.
(ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE,
use x86_64_preserve_none_int_parameter_registers.
(init_cumulative_args): Set preserve_none_abi.
(function_arg_64): Use x86_64_preserve_none_int_parameter_registers
with preserve_none attribute.
(setup_incoming_varargs_64): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
(ix86_save_reg): Treat TYPE_PRESERVE_NONE like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_nsaved_sseregs): Allow saving XMM registers for
no_caller_saved_registers attribute.
(ix86_compute_frame_layout): Likewise.
(x86_this_parameter): Use
x86_64_preserve_none_int_parameter_registers with preserve_none
attribute.
* config/i386/i386.h (ix86_args): Add preserve_none_abi.
(call_saved_registers_type): Add TYPE_PRESERVE_NONE.
(machine_function): Change call_saved_registers to 3 bits.
* doc/extend.texi: Add preserve_none attribute.  Update
no_caller_saved_registers attribute to remove -mgeneral-regs-only
restriction.

gcc/testsuite/

PR target/119628
* gcc.target/i386/no-callee-saved-3.c: Adjust error location.
* gcc.target/i386/no-callee-saved-19a.c: New test.
* gcc.target/i386/no-callee-saved-19b.c: Likewise.
* gcc.target/i386/no-callee-saved-19c.c: Likewise.
* gcc.target/i386/no-callee-saved-19d.c: Likewise.
* gcc.target/i386/no-callee-saved-19e.c: Likewise.
* gcc.target/i386/preserve-none-1.c: Likewise.
* gcc.target/i386/preserve-none-2.c: Likewise.
* gcc.target/i386/preserve-none-3.c: Likewise.
* gcc.target/i386/preserve-none-4.c: Likewise.
* gcc.target/i386/preserve-none-5.c: Likewise.
* gcc.target/i386/preserve-none-6.c: Likewise.
* gcc.target/i386/preserve-none-7.c: Likewise.
* gcc.target/i386/preserve-none-8.c: Likewise.
* gcc.target/i386/preserve-none-9.c: Likewise.
* gcc.target/i386/preserve-none-10.c: Likewise.
* gcc.target/i386/preserve-none-11.c: Likewise.
* gcc.target/i386/preserve-none-12.c: Likewise.
* gcc.target/i386/preserve-none-13.c: Likewise.
* gcc.target/i386/preserve-none-14.c: Likewise.
* gcc.target/i386/preserve-none-15.c: Likewise.
* gcc.target/i386/preserve-none-16.c: Likewise.
* gcc.target/i386/preserve-none-17.c: Likewise.
* gcc.target/i386/preserve-none-18.c: Likewise.
* gcc.target/i386/preserve-none-19.c: Likewise.
* gcc.target/i386/preserve-none-20.c: Likewise.
* gcc.target/i386/preserve-none-21.c: Likewise.
* gcc.target/i386/preserve-none-22.c: Likewise.
* gcc.target/i386/preserve-none-23.c: Likewise.
* gcc.target/i386/preserve-none-24.c: Likewise.
* gcc.target/i386/preserve-none-25.c: Likewise.
* gcc.target/i386/preserve-none-26.c: Likewise.
* gcc.target/i386/preserve-none-27.c: Likewise.
* gcc.target/i386/preserve-none-28.c: Likewise.
* gcc.target/i386/preserve-none-29.c: Likewise.
* gcc.target/i386/preserve-none-30a.c: Likewise.
* gcc.target/i386/preserve-none-30b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

x86: Add debug dump for the remove_redundant_vector pass

Add debug dump for the remove_redundant_vector pass with the following
output:

Replace:

(insn 7 4 8 2 (set (reg:V2DI 103)
        (const_vector:V2DI [
                (const_int 0 [0]) repeated x2
            ])) "x.c":8:13 2406 {movv2di_internal}
     (nil))

with:

(insn 7 4 8 2 (set (reg:V2DI 103)
        (subreg:V2DI (reg:V32QI 109) 0)) "x.c":8:13 2406 {movv2di_internal}
     (nil))

...

Replace:

(insn 16 15 17 3 (set (reg:V4DI 105)
        (const_vector:V4DI [
                (const_int 0 [0]) repeated x4
            ])) "x.c":13:28 2405 {movv4di_internal}
     (nil))

with:

(insn 16 15 17 3 (set (reg:V4DI 105)
        (subreg:V4DI (reg:V32QI 109) 0)) "x.c":13:28 2405 {movv4di_internal}
     (nil))

...

Place:

(insn 25 5 23 2 (set (reg:V32QI 109)
        (const_vector:V32QI [
                (const_int 0 [0]) repeated x32
            ])) -1
     (nil))

after:

(insn 23 25 24 2 (set (reg/f:DI 107 [ mem1 ])
        (reg:DI 5 di [ mem1 ])) "x.c":5:1 95 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 5 di [ mem1 ])
        (nil)))

in the *.309r.rrvl debug dump.

* config/i386/i386-features.cc (ix86_place_single_vector_set):
Add debug dump.
(replace_vector_const): Likewise.
(remove_redundant_vector_load): Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

arc: Use intrinsics for __builtin_mul_overflow ()

This patch handles both signed and unsigned builtin multiplication
overflow.

Uses the "mpy.f" instruction to set the condition codes based on the
result.  In the event of an overflow, the V flag is set, triggering a
conditional move depending on the V flag status.

For example, set "1" to "r0" in case of overflow:

        mov_s   r0,1
        mpy.f   r0,r0,r1
        j_s.d   [blink]
        mov.nv  r0,0

gcc/ChangeLog:

* config/arc/arc.md (<su_optab>mulvsi4): New define_expand.
(<su_optab>mulsi3_Vcmp): New define_insn.

Signed-off-by: Luis Silva <luiss@synopsys.com>

arc: Add commutative multiplication patterns

This patch introduces two new instruction patterns:

    `*mulsi3_cmp0`: This pattern performs a multiplication and sets
    the CC_Z register based on the result, while also storing the
    result of the multiplication in a general-purpose register.

    `*mulsi3_cmp0_noout`: This pattern performs a multiplication and
    sets the CC_Z register based on the result without storing the
    result in a general-purpose register.

These patterns are optimized to generate code using the `mpy.f`
instruction, specifically used where the result is compared to zero.

In addition, the previous commutative multiplication implementation
was removed.  It incorrectly took into account the negative flag,
which is wrong.  This new implementation only considers the zero flag.

A test case has been added to verify the correctness of these changes.

gcc/ChangeLog:

* config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
results compared against zero, selecting CC_Zmode.
* config/arc/arc.md (*mulsi3_cmp0): New define_insn.
(*mulsi3_cmp0_noout): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/arc/mult-cmp0.c: New test.

Signed-off-by: Luis Silva <luiss@synopsys.com>

arc: testsuite: Scan rlc instead of mov.hs

Due to the patch by Roger Sayle,
09881218137f4af9b7c894c2d350cf2ff8e0ee23, which introduces the use of
the `rlc rX,0` instruction in place of the `mov.hs`, the add overflow
test case needs to be updated. The previous test case was validating
the `mov.hs` instruction, but now it must validate the `rlc`
instruction as the new behavior.

gcc/testsuite/ChangeLog:

* gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.

Signed-off-by: Luis Silva <luiss@synopsys.com>

ARC: Use intrinsics for __builtin_sub_overflow*()

This patch covers signed and unsigned subtractions.  The generated code
would be something along these lines:

signed:
  sub.f   r0, r1, r2
  b.v     @label

unsigned:
  sub.f   r0, r1, r2
  b.c     @label

gcc/

* config/arc/arc.md (subsi3_v, subvsi4, subsi3_c): New patterns.

gcc/testsuite/

* gcc.target/arc/overflow-2.c: New file.

ARC: Use intrinsics for __builtin_add_overflow*()

This patch covers signed and unsigned additions.  The generated code
would be something along these lines:

signed:
  add.f   r0, r1, r2
  b.v     @label

unsigned:
  add.f   r0, r1, r2
  b.c     @label

gcc/

* config/arc/arc-modes.def (CC_V): New mode.
* config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New
function declaration.
* config/arc/arc.cc (arc_gen_unlikely_cbranch): New
function.
(get_arc_condition_code): Handle new mode.
* config/arc/arc.md (addvsi3_v, addvsi4, addsi3_c, uaddvsi4): New
patterns.
* config/arc/predicates.md (proper_comparison_operator): Handel
the new V_mode.
(equality_comparison_operator): Likewise.

gcc/testsuite/

* gcc.target/arc/overflow-1.c: New file

diagnostics: Mark path_label::get_effects as final override

When compiling diagnostic-path-output.cc with clang, it warns that
path_label::get_effects should be marked as override. That looks like
a good idea and from a brief look I also believe it should be marked
as final (the other override in the class is marked as both), so this
patch does that.

Likewise for html_output_format::after_diagnostic in
diagnostic-format-html.cc which also already has quite a few member
functions marked as final override.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* diagnostic-path-output.cc (path_label::get_effects): Mark as
final override.
* diagnostic-format-html.cc
(html_output_format::after_diagnostic): Likewise.

ranger-op: Use CFN_ constant instead of plain BUILTIN_ one

when compiling gimple-range-op.cc, clang issues warning:

gimple-range-op.cc:1419:18: warning: comparison of different enumeration types in switch statement ('combined_fn' and 'built_in_function') [-Wenum-compare-switch]

which I hope is harmless, but all other switch cases use CFN_ prefixed
constants, so I guess the ISINF case should too.

gcc/ChangeLog:

2025-06-23 Martin Jambor <mjambor@suse.cz>

* gimple-range-op.cc
(gimple_range_op_handler::maybe_builtin_call): Use
CFN_BUILT_IN_ISINF instead of BUILT_IN_ISINF.

value-relation.h: Mark dom_oracle::next_relation as override

When GCC is compiled with clang, it emits a warning that
dom_oracle::next_relation is not marked as override even though it
does override a virtual function of its ancestor. This patch marks it
as such to silence the warning and for the sake of consistency.

There are other member functions in the class which are marked as
final override but this particular function is in the protected
section so I decided to just mark it as override.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* value-relation.h (class dom_oracle): Mark member function
next_relation as override.

tree-ssa-propagate.h: Mark two functions as override

When tree-ssa-propagate.h is compiled with clang, it complains that
member functions functions value_of_expr and range_of_expr of class
substitute_and_fold_engine are not marked as override even though they
do override virtual functions of the ancestor class.  This patch
merely adds the keyword to silence the warning and for consistency's
sake.

I did not make this part of the previous patch because I wanted to
point out that the first case is quite unusual, a virtual function
with a functional body (range_query::value_of_expr) is being
overridden with a pure virtual function.  I assume it was a conscious
decision but adding the override keyword seems even more important
then.

gcc/ChangeLog:

2025-06-24  Martin Jambor  <mjambor@suse.cz>

* tree-ssa-propagate.h (class substitute_and_fold_engine): Mark
member functions value_of_expr and range_of_expr as override.

ranger: Mark several member functions as final override

When GCC is built with clang, it emits warnings that several member
functions of various ranger classes override a virtual function of an
ancestor but are not marked with the override keyword.  After
inspecting the cases, I found that all these classes had other member
functions marked as final override, so I added the final keyword
everywhere too.

In some cases other such overrides were not explicitly marked as
virtual, which made formatting easier.  For that reason and also for
consistency, in such cases I removed the virtual keyword from the
functions I marked as final override too.

gcc/ChangeLog:

2025-06-24  Martin Jambor  <mjambor@suse.cz>

* range-op-mixed.h (class operator_plus): Mark member function
overflow_free_p as final override.
(class operator_minus): Likewise.
(class operator_mult): Likewise.
* range-op-ptr.cc (class pointer_plus_operator): Mark member
function lhs_op1_relation as final override.
* range-op.cc (class operator_div::): Mark member functions
op2_range and update_bitmask as final override.
(class operator_logical_and): Mark member functions fold_range,
op1_range and op2_range as final override.  Remove unnecessary
virtual.
(class operator_logical_or): Likewise.
(class operator_logical_not): Mark member functions fold_range and
op1_range as final override.  Remove unnecessary virtual.
formatting easier.
(class operator_absu): Mark member functions wi_fold as final
override.

coroutines: Remove unused private member in cp_coroutine_transform

When building GCC with clang, it warns that the private member suffix
in class cp_coroutine_transform (defined in gcc/cp/coroutines.h) is
not used which indeed looks like it is the case. This patch therefore
removes it.

gcc/cp/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* coroutines.h (class cp_coroutine_transform): Remove member
orig_fn_body.

Mark pass_sccopy gate and execute functions as final override

It is customary to mark the gate and execute functions of the classes
representing passes as final override but this is missing in
pass_sccopy. This patch adds it which also silences clang warnings
about it.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* gimple-ssa-sccopy.cc (class pass_sccopy): Mark member functions
gate and execute as final override.

Mark rtl_avoid_store_forwarding functions final override

It is customary to mark the gate and execute functions of the classes
representing passes as final override but this is missing in
pass_rtl_avoid_store_forwarding. This patch adds it which also
silences a clang warning about it.

gcc/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* avoid-store-forwarding.cc (class
pass_rtl_avoid_store_forwarding): Mark member function gate as
final override.

Remove unused vector in value-relation.cc.

The relation_to_code vector in value-relation is now unused, so we can
remove it.

* value-relation.cc (relation_to_code): Remove.

Promote verify_range to vrange.

most range classes had a verufy_range, but it was all private. Make it a
supported routine from vrange.

* value-range.cc (frange::verify_range): Constify.
(irange::verify_range): Constify.
* value-range.h (vrange::verify_range): New.
(irange::verify_range): Make public.
(prange::verify_range): Make public.
(prange::verify_range): Make public.
(value_range::verify_range): New.

get_bitmask is sometimes less refined.

get_bitmask intersects the current mask with a mask generated from the
range. If the 2 masks are incompatible, it currently returns UNKNOWN.
Instead, ti should return the original mask or information is lost.

* value-range.cc (irange::get_bitmask): Return original mask if
result is unknown.
(assert_snap_result): New.
(test_irange_snap_bounds): New.
(range_tests_misc): Call test_irange_snap_bounds.

tree-optimization/109892 - SLP reduction of fma

The following adds the ability to vectorize a fma reduction pair
as SLP reduction (we cannot yet handle ternary association in
reduction vectorization yet).

PR tree-optimization/109892
* tree-vect-loop.cc (check_reduction_path): Handle fma.
(vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code
generation constraints.

* gcc.dg/vect/vect-reduc-fma-1.c: New testcase.
* gcc.dg/vect/vect-reduc-fma-2.c: Likewise.
* gcc.dg/vect/vect-reduc-fma-3.c: Likewise.

tree-optimization/120808 - SLP build with mixed .FMA/.FMS

The following allows SLP build to succeed when mixing .FMA/.FMS
in different lanes like we handle mixed plus/minus. This does not
yet address SLP pattern matching to not being able to form
a FMADDSUB from this.

PR tree-optimization/120808
* tree-vectorizer.h (compatible_calls_p): Add flag to
indicate a FMA/FMS pair is allowed.
* tree-vect-slp.cc (compatible_calls_p): Likewise.
(vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator.
(vect_build_slp_tree_2): Handle calls in two-operator SLP build.
* tree-vect-slp-patterns.cc (compatible_complex_nodes_p):
Adjust.

* gcc.dg/vect/bb-slp-pr120808.c: New testcase.

ivopts: Change constant_multiple_of to expand aff nodes.

This changes the calls to tree_to_aff_combination in constant_multiple_of to
tree_to_aff_combination_expand along with associated plumbing of ivopts_data
and required cache.

This improves cases such as:

```c
void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) {
    for (unsigned long i = 0; i < end; i += step) {
        svst1(pg, p1, svld1_s32(pg, p2));
        p1 += step;
        p2 += step;
    }
}
```

Where ivopts previously didn't expand the SSA variables for the step increements
and so lacked the ability to group all the IV's and ended up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1w z31.s, p0/z, [x1]
add x4, x4, x2
st1w z31.s, p0, [x0]
add x1, x1, x2, lsl 2
add x0, x0, x2, lsl 2
cmp x3, x4
bhi .L3
.L1:
ret
```

After this change we end up with:

```
f:
cbz x3, .L1
mov x4, 0
.L3:
ld1w z31.s, p0/z, [x1, x4, lsl 2]
st1w z31.s, p0, [x0, x4, lsl 2]
add x4, x4, x2
cmp x3, x4
bhi .L3
.L1:
ret
```

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (constant_multiple_of): Change
tree_to_aff_combination to tree_to_aff_combination_expand and add
parameter to take ivopts_data.
(get_computation_aff_1): Change parameters and calls to include
ivopts_data.
(get_computation_aff): Ditto.
(get_computation_at) Ditto.:
(get_debug_computation_at) Ditto.:
(get_computation_cost) Ditto.:
(rewrite_use_nonlinear_expr) Ditto.:
(rewrite_use_address) Ditto.:
(rewrite_use_compare) Ditto.:
(remove_unused_ivs) Ditto.:

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/adr_7.c: New test.

libstdc++: Test for %S precision for durations with integral representation.

Existing test are extented to cover cases where not precision is specified,
or it is specified to zero. The precision value is ignored in all cases.

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/precision.cc: New tests.

rtl-ssa: Rewrite process_uses_of_deleted_def [PR120745]

process_uses_of_deleted_def seems to have been written on the assumption
that non-degenerate phis would be explicitly deleted by an insn_change,
and that the function therefore only needed to delete degenerate phis.
But that was inconsistent with the rest of the code, and wouldn't be
very convenient in any case.

This patch therefore rewrites process_uses_of_deleted_def to handle
general phis.

I'm not aware that this fixes any issues in current code, but it is
needed to enable the rtl-ssa dce work that Ondřej and Honza are
working on.

gcc/
PR rtl-optimization/120745
* rtl-ssa/changes.cc (process_uses_of_deleted_def): Rewrite to
handle deletions of non-degenerate phis.

libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]

For month_day we incorrectly reported day information to be available, which lead
to format_error being thrown from the call to formatter::format at runtime, instead
of making call to format ill-formed.

The included test cover most of the combinations of _ChronoParts and format
specifiers.

PR libstdc++/120650

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h
(formatter<chrono::month_day_last,_CharT>::parse): Call _M_parse with
only Month being available.
* testsuite/std/time/format/data_not_present_neg.cc: New test.

x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

-mtune=intel is used to generate a single binary to run well on both big
core and small core, similar to hybrid CPUs. Update -mtune=intel to tune
for Diamond Rapids and Clearwater Forest, instead of Silvermont.

PR target/120815
* common/config/i386/i386-common.cc (processor_alias_table):
Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for
PROCESSOR_INTEL.
* config/i386/i386-options.cc (processor_cost_table): Replace
intel_cost with alderlake_cost.
* config/i386/x86-tune-costs.h (intel_cost): Removed.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat
PROCESSOR_INTEL like PROCESSOR_ALDERLAKE.
(ix86_adjust_cost): Likewise.
* doc/invoke.texi: Update -mtune=intel for Diamond Rapids and
Clearwater Forest.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

i386: Remove CLDEMOTE for clients

CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
it will be enabled on Xeon and Atom servers, not clients. Remove them
since Alder Lake (where it is introduced).

gcc/ChangeLog:

* config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS
as base to remove PTA_CLDEMOTE.
(PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE
does not include that anymore.
* doc/invoke.texi: Update texi file.

RISC-V: Add Profiles RVA/B23S64 support.

This patch adds support for the RISC-V Profiles RVA23S64 and RVB23S64.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New Profiles.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-rva23s.c: New test.
* gcc.target/riscv/arch-rvb23s.c: New test.

Add -fauto-profile-inlining

this patch adds -fauto-profile-inlining which can be used to control
the auto-profile directed inlning.

gcc/ChangeLog:

* common.opt: (fauto-profile-inlining): New
* doc/invoke.texi (-fauto-profile-inlining): Document.
* ipa-inline.cc (inline_functions_by_afdo): Check
flag_auto_profile.
(early_inliner): Also do inline_functions_by_afdo with
!flag_early_inlining.

Remove early inlining from afdo pass

This pass removes early-inlining from afdo pass since all inlining should now
happen from early inliner. I tedted this on spec and there are 3 inlines
happening here which are blocked at early-inline time by hitting large function
growth limit. We probably want to bypass that limit, I will look into that
incrementaly.

This should make the non-inlined function profile merging hopefully easier.

It may still make sense to separate afdo inliner from early inliner to solve
the non-transitivity issues which is not that hard to do with current code
orgnaization. However this should be separate IPA pass rather then another
part of afdo pass, since it can be coneptually separate.

gcc/ChangeLog:

* auto-profile.cc: Update toplevel comment.
(early_inline): Remove.
(auto_profile): Don't do early inlining.

Daily bump.

gcn: Fix glc vs. sc0 handling for scalar memory access

gfx942 still uses glc for scalar access ('s_...') and only uses
sc0/nt/sc1 for vector access.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GLC_NAME): Fix and extend the
description in the comment.
* config/gcn/gcn.cc (print_operand): Extend the comment about
'G' and 'g'.
* config/gcn/gcn.md: Use 'glc' instead of %G where appropriate.

Fortran/OpenACC: Add Fortran support for acc_attach/acc_detach

While C/++ support the routines acc_attach{,_async} and
acc_detach{,_finalize}{,_async} routines since a long time, the Fortran
API routines where only added in OpenACC 3.3.

Unfortunately, they cannot directly be implemented in the library as
GCC will introduce a temporary array descriptor in some cases, which
causes the attempted attachment to the this temporary variable instead
of to the original one.

Therefore, those API routines are handled in a special way in the compiler.

gcc/fortran/ChangeLog:

* trans-stmt.cc (gfc_trans_call_acc_attach_detach): New.
(gfc_trans_call): Call it.

libgomp/ChangeLog:

* libgomp.texi (acc_attach, acc_detach): Update for Fortran
version.
* openacc.f90 (acc_attach{,_async}, acc_detach{,_finalize}{,_async}):
Add.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc-attach-detach-1.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-attach-detach-2.f90: New test.

RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate [PR119100]

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v6,fa0
  vfmacc.vv      v2,v6,v4

After, we get only one:
  vfmacc.vf      v2,fa0,v4

PR target/119100

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Handle both add and
acc FMA variants.
* config/riscv/vector.md (*pred_mul_<optab><mode>_scalar_undef): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmacc and vfmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: Add support for acc
variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Define
TEST_OUT.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f64.c: New test.

Fortran: fix ICE in verify_gimple_in_seq with substrings [PR120743]

PR fortran/120743

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_substring): Substring indices are of
type gfc_charlen_type_node. Convert to size_type_node for
pointer arithmetic only after offset adjustments have been made.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr120743.f90: New test.

Co-authored-by: Jerry DeLisle <jvdelisle@gcc.gnu.org>
Co-authored-by: Mikael Morin <mikael@gcc.gnu.org>

c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]

The following patch implements the P3618R0 paper by tweaking pedwarn
condition, adjusting pedwarn wording, adjusting one testcase and adding 4
new ones.  The paper was voted in as DR, so it isn't guarded on C++ version.

2025-06-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/120773
* decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching
main to the global module.  Only pedwarn for current_lang_name
other than lang_name_cplusplus and adjust pedwarn wording.

* g++.dg/parse/linkage5.C: Don't expect error on
extern "C++" int main ();.
* g++.dg/parse/linkage7.C: New test.
* g++.dg/parse/linkage8.C: New test.
* g++.dg/modules/main-2.C: New test.
* g++.dg/modules/main-3.C: New test.

i386: Convert LEA stack adjust insn to SUB when FLAGS_REG is dead

ADD/SUB is faster than LEA for most processors. Also, there are
several peephole2 patterns available that convert prologue esp
subtractions to pushes (at the end of i386.md). These process only
patterns with flags reg clobber, so they are ineffective
with clobber-less stack ptr adjustments, introduced by r16-1551
("x86: Enable separate shrink wrapping").

Introduce a peephole2 pattern that adds a clobber to a clobber-less
stack ptr adjustments when FLAGS_REG is dead.

gcc/ChangeLog:

* config/i386/i386.md
(@pro_epilogue_adjust_stack_add_nocc<mode>): Add type attribute.
(pro_epilogue_adjust_stack_add_nocc peephole2 pattern):
Convert pro_epilogue_adjust_stack_add_nocc variant to
pro_epilogue_adjust_stack_add when FLAGS_REG is dead.

Remove non-SLP path from vectorizable_load

This cleans the rest of vectorizable_load from non-SLP, propagates
out ncopies == 1, and elides loops from 0 to ncopies.

* tree-vect-stmts.cc (vectorizable_load): Remove non-SLP
paths and propagate out ncopies == 1.

diagnostic: fix for older version of GCC

Having both an enum and a variable with the same name triggers an error with
gcc 5.

gcc/ChangeLog:
* diagnostic-state-to-dot.cc (get_color_for_dynalloc_state):
Rename argument dynalloc_state to dynalloc_st.
(add_title_tr): Rename argument style to styl.
(on_xml_node): Rename local variable dynalloc_state to dynalloc_st.

libstdc++: Unnecessary type completion in __is_complete_or_unbounded [PR120717]

When checking __is_complete_or_unbounded on a reference to incomplete
type, we overeagerly try to instantiate/complete the referenced type
which besides being unnecessary may also produce an unexpected
-Wsfinae-incomplete warning (added in r16-1527) if the referenced type
is later defined.

This patch fixes this by effectively restricting the sizeof check to
object (except unknown-bound array) types.  In passing simplify the
implementation by using is_object instead of is_function/reference/void
and introducing a __maybe_complete_object_type helper.

PR libstdc++/120717

libstdc++-v3/ChangeLog:

* include/std/type_traits (__maybe_complete_object_type): New
helper trait, factored out from ...
(__is_complete_or_unbounded): ... here.  Only check sizeof on a
__maybe_complete_object_type type.  Fix formatting.
* testsuite/20_util/is_complete_or_unbounded/120717.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

gcc: remove atan from edom_only_function

According to the man page, atan does not produce an error. According to the C23
standard draft (N3088), a range error occurs for atan if a nonzero x is too
close to zero. Neither of them mentions that atan will result in a domain error.

gcc/ChangeLog:

* tree-call-cdce.cc (edom_only_function): Remove atan.

Signed-off-by: Yuao Ma <c8ef@outlook.com>

s390: Fix float vector extract for pre-z13

Also provide the vec_extract patterns for floats on pre-z13 machines
to prevent ICEing in those cases.

gcc/ChangeLog:

* config/s390/vector.md (VF): Don't restrict modes.
(VEC_SET_SINGLEFLOAT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extract-1.c: Fix test on arch11.
* gcc.target/s390/vector/vec-set-1.c: Run test on arch11.
* gcc.target/s390/vector/vec-extract-2.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

AArch64: promote aarch64-autovec-peference to mautovec-preference

As requested in my patch for -mmax-vectorization this promotes the parameter
--param aarch64-autovec-preference to a first class top target flag.

If both the parameter and the flag is specified the parameter takes precedence
with the reasoning that it may already be embedded in build systems.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_internal): Set
value of parameter based on option.
* config/aarch64/aarch64.opt (autovec-preference): New.
* doc/invoke.texi (autovec-preference): Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/autovec_param_asimd-only_2.c: New test.
* gcc.target/aarch64/autovec_param_default_2.c: New test.
* gcc.target/aarch64/autovec_param_prefer-asimd_2.c: New test.
* gcc.target/aarch64/autovec_param_prefer-sve_2.c: New test.
* gcc.target/aarch64/autovec_param_sve-only_2.c: New test.

AArch64: propose -mmax-vectorization as an option to override vector costing

With the middle-end providing a way to make vectorization more profitable by
scaling vect-scalar-cost-multiplier this makes a more user friendly option
to make it easier to use.

I propose making it an actual -m option that we document and retain vs using
the parameter name. In the future I would like to extend this option to modify
additional costing in the AArch64 backend itself.

This can be used together with --param aarch64-autovec-preference to get the
vectorizer to say, always vectorize with SVE. I did consider making this an
additional enum to --param aarch64-autovec-preference but I also think this is
a useful thing to be able to set with pragmas and attributes, but am open to
suggestions.

Note that as a follow up I plan on extending -fdump-tree-vect to support -stats
which is then intended to be usable with this flag.

gcc/ChangeLog:

* config/aarch64/aarch64.opt (max-vectorization): New.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Save
and restore option.
Implement it through vect-scalar-cost-multiplier.
(aarch64_attributes): Default to off.
* common/config/aarch64/aarch64-common.cc (aarch64_handle_option):
Initialize option.
* doc/extend.texi (max-vectorization): Document attribute.
* doc/invoke.texi (max-vectorization): Document flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cost_model_17.c: New test.
* gcc.target/aarch64/sve/cost_model_18.c: New test.

fortran: Mention user variable in SELECT TYPE temporary variable names

The temporary variables that are generated to implement SELECT TYPE
and TYPE IS statements have (before this change) a name depending only
on the type.  This can produce confusing dumps with code having multiple
SELECT TYPE statements, as it isn't obvious which SELECT TYPE construct
the variable relates to.  This is especially the case with nested SELECT
TYPE statements and with SELECT TYPE variables having identical types
(and thus identical names).

This change adds one additional user-provided discriminating string in
the variable names, using the value from the SELECT TYPE variable name
or last component reference name.  The additional string may be
truncated to fit in the temporary buffer.  This requires all buffers to
have matching sizes to get the same resulting name everywhere.

gcc/fortran/ChangeLog:

* misc.cc (gfc_var_name_for_select_type_temp): New function.
* gfortran.h (gfc_var_name_for_select_type_temp): Declare it.
* resolve.cc (resolve_select_type): Pick a discriminating name
from the SELECT TYPE variable reference and use it in the name
of the temporary variable that is generated.  Truncate name to
the buffer size.
* match.cc (select_type_set_tmp): Likewise.  Pass the
discriminating name...
(select_intrinsic_set_tmp): ... to this function.  Use the
discriminating name likewise.  Augment the buffer size to match
that of select_type_set_tmp and resolve_select_type.

gcc/testsuite/ChangeLog:

* gfortran.dg/select_type_51.f90: New test.

Don't duplicate setup code cost when do group-candidate cost calucalution.

- /* Uses in a group can share setup code, so only add setup cost once. */
- cost -= cost.scratch;

It looks like the original code took into account avoiding double
counting, but unfortunately cost is reset inside the follow loop which
invalidates the upper code, and makes same setup code cost duplicated in
each use of the group.

The patch fix the issue. It can also improve 548.exchange_r by 6% with
-march=x86-64-v3 -O2 due to better ivopt on EMR.

No big performance impact for SPEC2017 on graviton4/SPR with -mcpu=native
-Ofast -fomit-framepointer -flto=auto.

gcc/ChangeLog:

PR target/115842
* tree-ssa-loop-ivopts.cc (determine_group_iv_cost_address):
Don't recalculate inv_expr when group-candidate cost
calucalution.

middle-end: Apply loop->unroll directly in vectorizer

Consider the loop

void f1 (int *restrict a, int n)
{
#pragma GCC unroll 4 requested
  for (int i = 0; i < n; i++)
    a[i] *= 2;
}

Which today is vectorized and then unrolled 3x by the RTL unroller due to the
use of the pragma.  This is unfortunate because the pragma was intended for the
scalar loop but we end up with an unrolled vector loop and a longer path to the
entry which has a low enough VF requirement to enter.

This patch instead seeds the suggested_unroll_factor with the value the user
requested and instead uses it to maintain the total VF that the user wanted the
scalar loop to maintain.

In effect it applies the unrolling inside the vector loop itself.  This has the
benefits for things like reductions, as it allows us to split the accumulator
and so the unrolled loop is more efficient.  For early-break it allows the
cbranch call to be shared between the unrolled elements, giving you more
effective unrolling because it doesn't need the repeated cbranch which can be
expensive.

The target can then choose to create multiple epilogues to deal with the "rest".

The example above now generates:

.L4:
        ldr     q31, [x2]
        add     v31.4s, v31.4s, v31.4s
        str     q31, [x2], 16
        cmp     x2, x3
        bne     .L4

as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates:

.L4:
        ldp     q30, q31, [x2]
        add     v30.4s, v30.4s, v30.4s
        add     v31.4s, v31.4s, v31.4s
        stp     q30, q31, [x2], 32
        cmp     x3, x2
        bne     .L4

gcc/ChangeLog:

* doc/extend.texi: Document pragma unroll interaction with vectorizer.
* tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_analyze_loop_1): Set
suggested_unroll_factor and retry.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/unroll-vect.c: New test.

middle-end: replace log_vf usages with vf to allow support for non-power of two vf

This patch fixes a bug where the current code assumed that exact_log2 returns
NULL on failure, but it instead returns -1.  So there are some cases where the
right shift could shift out the entire value.

Secondly it also removes the requirement that VF be a power of two.  With an
uneven unroll factor we can easily end up with a non-power of two VF which SLP
can handle. This replaces shifts with multiplication and division.

The 32-bit x86 testcase from PR64110 was always wrong, it used to match by pure
coincidence a vmovd inside the vector loop.  What it intended to match was that
the argument to the function isn't spilled and then reloaded from the stack for
no reason.

But on 32-bit x86 all arguments are passed on the stack anyway and so the match
would have never worked.  The patch seems to simplify the loop preheader which
gets it to remove an intermediate zero extend which causes the match to now
properly fail.

As such I'm skipping the test on 32-bit x86.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters,
vect_gen_vector_loop_niters_mult_vf): Remove uses of log_vf.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr64110.c: Update testcase.

x86: Extend the remove_redundant_vector pass

Extend the remove_redundant_vector pass to handle vector broadcasts from
constant and variable scalars.  When broadcasting from constants and
function arguments, we can place a single widest vector broadcast at
entry of the nearest common dominator for basic blocks with all uses
since constants and function arguments aren't changed.  For broadcast
from variables with a single definition, the single definition is
replaced with the widest broadcast.

gcc/

PR target/92080
* config/i386/i386-expand.cc (ix86_expand_call): Set
recursive_function to true for recursive call.
* config/i386/i386-features.cc (ix86_place_single_vector_set):
Add an argument for inner scalar, default to nullptr.  Set the
source from inner scalar if not nullptr.
(ix86_get_vector_load_mode): Renamed to ...
(ix86_get_vector_cse_mode): This.  Add an argument for scalar mode
and handle integer and float scalar modes.
(replace_vector_const): Add an argument for scalar mode and pass
it to ix86_get_vector_load_mode.
(x86_cse_kind): New.
(redundant_load): Likewise.
(ix86_broadcast_inner): Likewise.
(remove_redundant_vector_load): Also support const0_rtx and
constm1_rtx broadcasts.  Handle vector broadcasts from constant
and variable scalars.
* config/i386/i386.h (machine_function): Add recursive_function.

gcc/testsuite/

* gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect
movdqa instead pxor.
* gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
* gcc.target/i386/pr92080-4.c: New test.
* gcc.target/i386/pr92080-5.c: Likewise.
* gcc.target/i386/pr92080-6.c: Likewise.
* gcc.target/i386/pr92080-7.c: Likewise.
* gcc.target/i386/pr92080-8.c: Likewise.
* gcc.target/i386/pr92080-9.c: Likewise.
* gcc.target/i386/pr92080-10.c: Likewise.
* gcc.target/i386/pr92080-11.c: Likewise.
* gcc.target/i386/pr92080-12.c: Likewise.
* gcc.target/i386/pr92080-13.c: Likewise.
* gcc.target/i386/pr92080-14.c: Likewise.
* gcc.target/i386/pr92080-15.c: Likewise.
* gcc.target/i386/pr92080-16.c: Likewise.
* gcc.target/i386/pr92080-17.c: Likewise.
* gcc.target/i386/pr92080-18.c: Likewise.
* gcc.target/i386/pr92080-19.c: Likewise.
* gcc.target/i386/pr92080-20.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Update memcpy/memset inline strategies for -mtune=generic

Update memcpy and memset inline strategies for -mtune=generic:

1. Don't align memory.
2. For known sizes, prefer vector loop, unroll loop with 4 moves or
   stores per iteration without aligning the loop, up to 256 bytes.
3. For unknown sizes, use memcpy/memset.
4. Since each loop iteration has 4 stores and 8 stores for zeroing with
   unroll loop may be needed, change CLEAR_RATIO to 10 so that zeroing
   up to 72 bytes are fully unrolled with 9 stores without SSE.

gcc/

PR target/70308
PR target/101366
PR target/102294
PR target/108585
PR target/118276
PR target/119596
PR target/119703
PR target/119704
* config/i386/x86-tune-costs.h (generic_memcpy): Updated.
(generic_memset): Likewise.
(generic_cost): Change CLEAR_RATIO to 10.

gcc/testsuite/

PR target/70308
PR target/101366
PR target/102294
PR target/108585
PR target/118276
PR target/119596
PR target/119703
PR target/119704
* g++.target/i386/memset-pr101366-1.C: New test.
* g++.target/i386/memset-pr101366-2.C: Likewise.
* g++.target/i386/memset-pr108585-1a.C: Likewise.
* g++.target/i386/memset-pr108585-1b.C: Likewise.
* g++.target/i386/memset-pr118276-1a.C: Likewise.
* g++.target/i386/memset-pr118276-1b.C: Likewise.
* g++.target/i386/memset-pr118276-1c.C: Likewise.
* gcc.target/i386/memcpy-strategy-12.c: Likewise.
* gcc.target/i386/memcpy-strategy-13.c: Likewise.
* gcc.target/i386/memset-pr70308-1a.c: Likewise.
* gcc.target/i386/memset-pr70308-1b.c: Likewise.
* gcc.target/i386/memset-strategy-25.c: Likewise.
* gcc.target/i386/memset-strategy-26.c: Likewise.
* gcc.target/i386/memset-strategy-27.c: Likewise.
* gcc.target/i386/memset-strategy-28.c: Likewise.
* gcc.target/i386/memset-strategy-29.c: Likewise.
* gcc.target/i386/memset-strategy-30.c: Likewise.
* gcc.target/i386/memset-strategy-31.c: Likewise.
* gcc.target/i386/auto-init-padding-3.c: Expect XMM stores.
* gcc.target/i386/auto-init-padding-9.c: Likewise.
* gcc.target/i386/mvc17.c: Fail with "rep mov"
* gcc.target/i386/pr111657-1.c: Scan for unrolled loop.  Fail
with "rep mov".
* gcc.target/i386/shrink_wrap_1.c: Also pass
-mmemset-strategy=rep_8byte:-1:align.
* gcc.target/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Copy discriminators when inlining

When inlining disciriminator info about the call statement is lost which
is not good for auto-profile and debug info quality. This patch fixes
it.

gcc/ChangeLog:

* tree-inline.cc (expand_call_inline): Preserve discriminator.

Fix AFDO zero profile handling

This patch fixes roms autofdo regression I introduced yesterday.  What happens
is that loop vectorization is disabled, because we get loop header count 0.
I.e.

loop_header:  <count 0>
  if (i < n)
    goto exit;
loop_body:    <count large>
  ... vectorizable computation ...

The reason is that "if (i < 0)" statement actually has 0 profile in AFDO
feedback.  This seems common and I believe it is an issue with debug info in
loop vecotrizer.  Because loop is vectorized during train run, the conditoinal
is replaced by vectorized loop conditional but the statement remains in the
loop epilogue which is not executed at runtime.

This is something we can fix and introduce debug statement in the vectorized loop
body so user can breakpoint on it. I will try to produce testcase for that.

However this patch fixes bug where I intended to only trust 0 counts from AFDO if they
are also 0 in static profile and reversed the conditinal.

autoprofile-bootstrapped/regtested x86_64-linux, comitted.

* auto-profile.cc (afdo_set_bb_count): Dump also 0 count stmts.
(afdo_annotate_cfg): Fix conditional for block having non-zero static
profile.

Fix shrink wrap separate ICE for mingw [PR120741]

gcc/ChangeLog:

PR target/120741
* config/i386/i386.cc (ix86_expand_prologue):
Remove 1 assertion.

gcc/testsuite/ChangeLog:

PR target/120741
* gcc.target/i386/pr120741.c: New test.
* gcc.target/i386/shrink-wrap-separate-mingw.c: Likewise.

[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

Fix typo in comment spotted by Peter B.

PR target/118241
gcc/
* config/riscv/predicates.md: Fix comment typo in recent change.

Daily bump.

Fixup dropping REG_EQUAL note in ext-dce

Followup to r16-1613-g34e1e5e33ec3eb. remove_reg_equal_equiv_notes's
2nd argument is 'no_rescan' which we accidentally had on, tripping
an assert in combine or ira because we hadn't left things in a consistent
state.

Fix the thinko by enabling rescanning.

gcc/ChangeLog:
PR rtl-optimization/120795

* ext-dce.cc (ext_dce_try_optimize_insn): Enable rescan in
remove_reg_equal_equiv_notes call.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

libgdiagnostics: sarif-replay: add extra sinks via -fdiagnostics-add-output= [PR116792,PR116163]

This patch refactors the support for -fdiagnostics-add-output=SCHEME
from GCC's options parsing so that it is also available to
sarif-replay and to other clients of libgdiagnostics.

With this users of sarif-replay and other such tools can generate HTML
or SARIF as well as text output, using the same
  -fdiagnostics-add-output=SCHEME
as GCC.

As a test, the patch adds support for this option to the dg-lint
script below "contrib".  For example dg-lint can now generate text,
html, and sarif output via:

  LD_LIBRARY_PATH=../build/gcc/ \
    ./contrib/dg-lint/dg-lint \
contrib/dg-lint/test-*.c \
-fdiagnostics-add-output=experimental-html:file=dg-lint-tests.html \
        -fdiagnostics-add-output=sarif:file=dg-lint-tests.sarif

where the HTML output from dg-lint can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-20/dg-lint-tests.html
the sarif output here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/dg-lint-tests.sarif
and a screenshot of VS Code viewing the sarif output is here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/vscode-viewing-dg-lint-sarif-output.png

As well as allowing sarif-replay to generate HTML, this patch allows
sarif-replay to also generate SARIF.  Ideally this would faithfully
round-trip all the data, but it's not perfect (which I'm tracking as
PR sarif-replay/120792).

contrib/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* dg-lint/dg-lint: Add -fdiagnostics-add-output.
* dg-lint/libgdiagnostics.py: Add
diagnostic_manager_add_sink_from_spec.
(Manager.add_sink_from_spec): New.

gcc/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* Makefile.in (OBJS-libcommon): Add diagnostic-output-spec.o.
* diagnostic-format-html.cc (html_builder::html_builder): Ensure
title is non-empty.
* diagnostic-output-spec.cc: New file, taken from material in
opts-diagnostic.cc.
* diagnostic-output-spec.h: New file.
* diagnostic.cc (diagnostic_context::set_main_input_filename):
New.
* diagnostic.h (diagnostic_context::set_main_input_filename): New
decl.
* doc/libgdiagnostics/topics/compatibility.rst
(LIBGDIAGNOSTICS_ABI_2): New.
* doc/libgdiagnostics/topics/diagnostic-manager.rst
(diagnostic_manager_add_sink_from_spec): New.
(diagnostic_manager_set_analysis_target): New.
* libgdiagnostics++.h (manager::add_sink_from_spec): New.
(manager::set_analysis_target): New.
* libgdiagnostics.cc: Include "diagnostic-output-spec.h".
(struct spec_context): New.
(diagnostic_manager_add_sink_from_spec): New.
(diagnostic_manager_set_analysis_target): New.
* libgdiagnostics.h
(LIBDIAGNOSTICS_HAVE_diagnostic_manager_add_sink_from_spec): New
define.
(diagnostic_manager_add_sink_from_spec): New decl.
(LIBDIAGNOSTICS_HAVE_diagnostic_manager_set_analysis_target): New
define.
(diagnostic_manager_set_analysis_target): New decl.
* libgdiagnostics.map (LIBGDIAGNOSTICS_ABI_2): New.
* libsarifreplay.cc (sarif_replayer::handle_artifact_obj): Looks
for "analysisTarget" in roles and call set_analysis_target using
the artifact if found.
* opts-diagnostic.cc: Refactor, moving material to
diagnostic-output-spec.cc.
(struct opt_spec_context): New.
(handle_OPT_fdiagnostics_add_output_): Use opt_spec_context.
(handle_OPT_fdiagnostics_set_output_): Likewise.
* sarif-replay.cc: Define INCLUDE_STRING.
(struct options): Add m_extra_output_specs.
(usage_msg): Add -fdiagnostics-add-output=SCHEME.
(str_starts_with): New.
(parse_options): Add -fdiagnostics-add-output=SCHEME.
(main): Likewise.
* selftest-run-tests.cc (selftest::run_tests): Call
diagnostic_output_spec_cc_tests rather than
opts_diagnostic_cc_tests.
* selftest.h (selftest::diagnostic_output_spec_cc_tests):
Replace...
(selftest::opts_diagnostic_cc_tests): ...this.

gcc/testsuite/ChangeLog:
PR other/116792
PR testsuite/116163
PR sarif-replay/120792
* sarif-replay.dg/2.1.0-valid/signal-1-check-html.py: New test
script.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Add html and sarif
generation to options.  Invoke the new script to verify that HTML
and SARIF is generated.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix missing "final override"

No functional change intended.

gcc/analyzer/ChangeLog:
* region-model.cc
(exception_thrown_from_unrecognized_call::print): Add
"final override" to vfunc.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

OpenACC: Add 'if' clause to 'acc wait' directive

OpenACC 3.0 added the 'if' clause to four directives; this patch only adds
it to 'acc wait'.

gcc/c-family/ChangeLog:

* c-omp.cc (c_finish_oacc_wait): Handle if clause.

gcc/c/ChangeLog:

* c-parser.cc (OACC_WAIT_CLAUSE_MASK): Add if clause.

gcc/cp/ChangeLog:

* parser.cc (OACC_WAIT_CLAUSE_MASK): Ass if clause.

gcc/fortran/ChangeLog:

* openmp.cc (OACC_WAIT_CLAUSES): Add if clause.
* trans-openmp.cc (gfc_trans_oacc_wait_directive): Handle it.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/acc-wait-1.c: New test.
* gfortran.dg/goacc/acc-wait-1.f90: New test.

Fortran: fix checking of renamed-on-use interface name [PR120784]

PR fortran/120784

gcc/fortran/ChangeLog:

* interface.cc (gfc_match_end_interface): If a use-associated
symbol is renamed, use the local_name for checking.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_63.f90: New test.

contrib: handle GDB's 'unexpected core files' count

This commit is for the benefit of GDB, but as the binutils-gdb
repository shares the contrib/ directory with gcc, this commit must
first be applied to gcc then copied back to binutils-gdb.

This commit extends the two scripts contrib/dg-extract-results.{py,sh}
to handle GDB's 'unexpected core files' count.  This test result type
should never appear in GCC, or any other tool that shares the contrib/
directory, so this change should be harmless for others.

The 'unexpected core files' count was added to GDB's results by this
series:

  https://inbox.sourceware.org/gdb-patches/20220623183053.172430-1-pedro@palves.net

this count is added to the gdb.sum file after all the tests have run,
and counts up any core.* files that have appeared.

GDB also has a make-check-all.sh script which runs a test with all the
different board files that GDB supports.  After each test is run the
'unexpected core files' count will be added to that board's results.

I'm now trying to use the dg-extract-results.* scripts to merge the
results from all the different board files, and the 'unexpected core
files' count is confusing these scripts.

contrib/ChangeLog:

* dg-extract-results.py: Handle GDB's unexpected core file count.
* dg-extract-results.sh: Likewise.

diagnostics: add state diagrams to analyzer experimental-html output [PR116792]

This patch adds various support for debugging diagnostic paths and
events, intended initially for myself to help with debugging -fanalyzer.

It adds the optional ability for a diagnostic_event to supply a
description of the predicted state of the program at that point along
the diagnostic_path.  To isolate the diagnostic subsystem from the
analyzer, this representation is currently an xml::document with custom
elements.  The XML representation is similar to the analyzer's internal
state but can be easier to read - for example, rather than storing the
contents of memory via byte offsets, it uses fields for structs and
element indexes for arrays, recursively.

These states are handled by the HTML and SARIF diagnostic sinks.

The SARIF sink simply embeds the XML as a string in a property bag of the
threadFlowLocation object (SARIF v2.1.0 section 3.38).

For HTML output, the "experimental-html" sink gains a new
"show-state-diagrams=yes" option i.e.:
  -fdiagnostics-add-output=experimental-html:show-state-diagrams=yes
which converts the state XML into SVG diagrams visualizing the state of
memory at each event, inspired by the "ddd" debugger.  These can be seen
by pressing 'j' and 'k' to single-step forward and backward through
events, making it *much* easier to debug -fanalyzer.

An example of output can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html
showing an issue in a singly-linked list; there are various other
examples in the parent directory.

Generating the SVG diagrams requires an invocation of "dot" per event,
so it noticeable slows down diagnostic emission, hence the opt-in
command-line flag.  However, I'm already finding bugs in -fanalyzer with
this that I hadn't seen before.

Given that the UI is rather clunky and there is lots of room for
improvement to the visualizations, for now this feature is marked
as being for GCC developers, not end-users.

The patch also adds a dot::ast_node class hierarachy to make it easy to
create GraphViz dot files with the correct escaping, and adds a C++
wrapper around pex adding some syntactic sugar for invoking
subprocesses.

gcc/ChangeLog:
PR other/116792
* Makefile.in (ANALYZER_OBJS): Add
analyzer/ana-state-to-diagnostic-state.o.
(OBJS): Move graphviz.o to...
(OBJS-libcommon): ...here.  Add diagnostic-state-to-dot.o and pex.o.
* diagnostic-format-html.cc: Include "diagnostic-state.h" and
"graphviz.h".
(html_generation_options::html_generation_options): Initialize the
new flags.
(HTML_SCRIPT): Add function "get_any_state_diagram".  Use it
when changing current focus id to update the visibility of the
pertinent diagram, if any.
(print_pre_source): New.
(html_builder::maybe_make_state_diagram): New.
(html_path_label_writer::html_path_label_writer): Add "path" param.
Initialize m_path and m_curr_event_id.
(html_path_label_writer::begin_label): Store current event id.
(html_path_label_writer::end_label): Attempt to make a state
diagram and add it if successful.
(html_path_label_writer::get_element_id): New.
(html_path_label_writer::m_path): New field.
(html_path_label_writer::m_curr_event_id): New field.
(html_builder::make_element_for_diagnostic): Pass path to label
writer.
* diagnostic-format-html.h
(html_generation_options::m_show_state_diagrams): New field.
(html_generation_options::m_show_state_diagram_xml): New field.
(html_generation_options::m_show_state_diagram_dot_src): New field.
* diagnostic-format-sarif.cc: Include "xml.h".
(populate_thread_flow_location_object): If requested, attempt to
generate xml state and add it to the proeprty bag as
"gcc/diagnostic_event/xml_state" in xml source form.
(sarif_generation_options::sarif_generation_options): Initialize
m_xml_state.
* diagnostic-format-sarif.h
(sarif_generation_options::m_xml_state): New field.
* diagnostic-path.cc: Define INCLUDE_MAP.  Include "xml.h".
(diagnostic_event::maybe_make_xml_state): New.
* diagnostic-path.h (class xml::document): New forward decl.
(diagnostic_event::maybe_make_xml_state): New vfunc decl.
* diagnostic-state-to-dot.cc: New file.
* diagnostic-state.h: New file.
* digraph.cc: Define INCLUDE_STRING and INCLUDE_VECTOR.
* doc/analyzer.texi: Document state diagrams in html output.
(__analyzer_dump_dot): New.
(__analyzer_dump_xml): New.
* doc/invoke.texi (sarif): Add "xml-state" key.
(experimental-html): Add keys "show-state-diagrams",
"show-state-diagrams-dot-src" and "show-state-diagrams-xml".
* graphviz.cc: Define INCLUDE_MAP, INCLUDE_STRING, and
INCLUDE_VECTOR.  Include "xml.h", "xml-printer.h", "pex.h" and
"selftest.h".
(graphviz_out::graphviz_out): Extract...
(dot::writer::writer): ...this.
(graphviz_out::write_indent): Convert to...
(dot::writer::write_indent): ...this.
(graphviz_out::print): Use get_pp.
(graphviz_out::println): Likewise.
(graphviz_out::begin_tr): Likewise.
(graphviz_out::end_tr): Likewise.
(graphviz_out::begin_td): Likewise.
(graphviz_out::end_td): Likewise.
(graphviz_out::begin_trtd): Likewise.
(graphviz_out::end_tdtr): Likewise.
(dot::ast_node::dump): New.
(dot::id::id): New.
(dot::id::print): New.
(dot::id::is_identifier_p): New.
(dot::kv_pair::print): New.
(dot::attr_list::print): New.
(dot::stmt_list::print): New.
(dot::stmt_list::add_edge): New.
(dot::stmt_list::add_attr): New.
(dot::graph::print): New.
(dot::stmt_with_attr_list::set_label): New.
(dot::node_stmt::print): New.
(dot::attr_stmt::print): New.
(dot::kv_stmt::print): New.
(dot::node_id::print): New.
(dot::port::print): New.
(dot::edge_stmt::print): New.
(dot::subgraph::print): New.
(dot::make_svg_document_buffer_from_graph): New.
(dot::make_svg_from_graph): New.
(selftest:test_ids): New.
(selftest:test_trivial_graph): New.
(selftest:test_layout_example): New.
(selftest:graphviz_cc_tests): New.
* graphviz.h (xml::node): New forward decl.
(class graphviz_out): Split out into...
(class dot::writer): ...this new class
(struct dot::ast_node): New.
(struct dot::id): New.
(struct dot::kv_pair): New.
(struct dot::attr_list): New.
(struct dot::stmt_list): New.
(struct dot::graph): New.
(struct dot::stmt): New.
(struct dot::stmt_with_attr_list): New.
(struct dot::node_stmt): New.
(struct dot::attr_stmt): New.
(struct dot::kv_stmt): New.
(enum class dot::compass_pt): New.
(struct dot::port): New.
(struct dot::node_id): New.
(struct dot::edge_stmt): New.
(struct dot::subgraph): New.
(dot::make_svg_from_graph): New.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add
"xml-state" flag.
(html_scheme_handler::make_sink): Add flags "show-state-diagrams",
"show-state-diagram-dot-src", and "show-state-diagram-xml".
* pex.cc: New file.
* pex.h: New file.
* selftest-run-tests.cc (selftest::run_tests): Call
graphviz_cc_tests.
* selftest.h (selftest::graphviz_cc_tests): New decl.
* xml.cc (xml::node_with_children::add_comment): New.
(xml::node_with_children::find_child_element): New.
(xml::element::get_attr): New.
(xml::comment::write_as_xml): New.
(selftest::test_printer): Add coverage of find_child_element and
get_attr.
(selftest::test_comment): New.
(selftest::xml_cc_tests): Call test_comment.
* xml.h: New forward decls.
(xml::node::dyn_cast_text): Use nullptr.
(xml::node::dyn_cast_element): New vfunc.
(xml::node_with_children::add_comment): New decl.
(xml::node_with_children::find_child_element): New decl.
(xml::element::dyn_cast_element): New vfunc impl.
(xml::element::get_attr): New decl.
(struct xml::comment): New xml::node subclass.

gcc/analyzer/ChangeLog:
PR other/116792
* ana-state-to-diagnostic-state.cc: New file.
* ana-state-to-diagnostic-state.h: New file.
* checker-event.cc: Include "xml.h".
(checker_event::checker_event): Initialize m_path.
(checker_event::prepare_for_emission): Store the path pointer into
m_path.
(checker_event::maybe_make_xml_state): New.
(function_entry_event::function_entry_event): Add "state" param
and use it to initialize m_state.
(superedge_event::get_program_state): New.
(call_event::get_program_state): New.
(warning_event::get_program_state): New.
* checker-event.h (checker_event::get_program_state): New vfunc.
(checker_event::maybe_make_xml_state): New decl.
(checker_event::m_path): New field.
(statement_event::get_program_state): New vfunc impl.
(function_entry_event::function_entry_event): Add "state" param.
(function_entry_event::get_program_state): New vfunc impl.
(function_entry_event::m_state): New field.
(state_change_event::get_program_state): New vfunc impl.
(superedge_event::get_program_state): New vfunc decl.
(warning_event::warning_event): Add "program_state_" param and
copy it.
(warning_event::get_program_state): New vfunc decl.
(warning_event::m_program_state): New field.
* checker-path.h (checker_path::checker_path): Add ext_state param.
(checker_path::get_ext_state): New accessor.
(checker_path::m_ext_state): New field.
* common.h: Define INCLUDE_MAP and INCLUDE_STRING.
* diagnostic-manager.cc (saved_diagnostic::operator==): Don't
deduplicate dump_path_diagnostic instances.
(diagnostic_manager::emit_saved_diagnostic): Pass ext_state to
checker_path ctor.
* engine.cc:
(impl_region_model_context::on_state_leak): Pass old and new state
to state_machine::on_leak.
(exploded_node::on_stmt_pre): Implement __analyzer_dump_xml and
__analyzer_dump_dot.
* exploded-graph.h (impl_region_model_context::get_state): New.
* infinite-recursion.cc
(recursive_function_entry_event::recursive_function_entry_event):
Add "dst_state" param and pass to function_entry_event ctor.
(infinite_recursion_diagnostic::add_function_entry_event): Pass state
to event ctor.
* kf-analyzer.cc: Include "analyzer/program-state.h"
(dump_path_diagnostic::dump_path_diagnostic): Add "state" param.
(dump_path_diagnostic::get_final_state): New.
(dump_path_diagnostic::m_state): New field.
(kf_analyzer_dump_path::impl_call_pre): Pass state to warning.
* pending-diagnostic.cc
(pending_diagnostic::add_function_entry_event): Pass state to
function_entry_event.
(pending_diagnostic::add_final_event): Likewise to warning_event.
* pending-diagnostic.h (pending_diagnostic::get_final_state): New
vfunc decl.
* program-state.cc: Include "diagnostic-state.h", "graphviz.h" and
"analyzer/ana-state-to-diagnostic-state.h".
(program_state::dump_dot): New.
* program-state.h: Include "text-art/tree-widget.h" and
"analyzer/store.h".
(class xml::document): New forward decl.
(make_xml): New.
(dump_xml_to_pp): New.
(dump_xml_to_file): New.
(dump_xml): New.
(dump_dot): New.
* record-layout.cc (record_layout::record_layout): Make param
const_tree.
* record-layout.h (item::item): Likewise.
(item::m_field): Likewise.
(record_layout::record_layout): Likewise.
(record_layout::begin): New.
(record_layout::end): New.
* region-model.cc
(exposure_through_uninit_copy::complain_about_fully_uninit_item):
Use const_tree.
(exposure_through_uninit_copy::complain_about_partially_uninit_item):
Likewise.
* region-model.h (region_model_context::get_state): New vfunc.
(noop_region_model_context::get_state): New.
(region_model_context_decorator::get_state): New.
* sm-fd.cc (fd_leak::fd_leak): Add "final_state" param and capture
it if present.
(fd_leak::get_final_state): New.
(fd_leak::m_final_state): New.
(fd_state_machine::on_open): Pass nullptr for new "final_state"
param.
(fd_state_machine::on_creat): Likewise.
(fd_state_machine::on_socket): Likewise.
(fd_state_machine::on_accept): Likewise.
(fd_state_machine::on_leak): Add state params and pass new state
as final state to fd_leak ctor.
* sm-file.cc: Include "analyzer/program-state.h".
(file_leak::file_leak): Add "final_state" param and capture it if
present.
(file_leak::get_final_state): New.
(file_leak::m_final_state): New.
(fileptr_state_machine::on_leak): Add state params and pass new
state as final state to fd_leak ctor.
* sm-malloc.cc: Include
"analyzer/ana-state-to-diagnostic-state.h".
(malloc_leak::malloc_leak): Add "final_state" param and use it.
(malloc_leak::get_final_state): New vfunc impl.
(malloc_leak::m_final_state): New field.
(malloc_state_machine::on_leak): Add state params; capture final
state.
(malloc_state_machine::add_state_to_xml): New.
* sm.cc (state_machine::on_leak): Add "old_state" and "new_state"
params.  Use nullptr.
(state_machine::add_state_to_xml): New.
(state_machine::add_global_state_to_xml): New.
* sm.h (class xml_state): New forward decl.
(state_machine::on_leak): Add state params.
(state_machine::add_state_to_xml): New vfunc decl.
(state_machine::add_global_state_to_xml): New vfunc decl.
* store.h (bit_range::operator<): New.
* varargs.cc (va_list_leak::va_list_leak): Add final_state param
and capture it if non-null.
(va_list_leak::get_final_state): New.
(va_list_leak::m_final_state): New.
(va_list_state_machine::on_leak): Add state params and pass final
state to va_list_leak ctor.

gcc/testsuite/ChangeLog:
PR other/116792
* g++.dg/analyzer/state-diagram.C: New test.
* gcc.dg/analyzer/analyzer-decls.h (__analyzer_dump_dot): New
decl.
(__analyzer_dump_xml): New decl.
* gcc.dg/analyzer/state-diagram-1-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-1.c: New test.
* gcc.dg/analyzer/state-diagram-2.c: New test.
* gcc.dg/analyzer/state-diagram-3.c: New test.
* gcc.dg/analyzer/state-diagram-4.c: New test.
* gcc.dg/analyzer/state-diagram-5-html.py: New test script.
* gcc.dg/analyzer/state-diagram-5-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-5.c: New test.
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Define INCLUDE_STRING.
* gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise.
* lib/htmltest.py (ns): Add SVG namespace.
* lib/sarif.py (get_result_by_index): New.
(get_xml_state): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: handle pp_token::kind::event_id in experimental-html sink [PR116792]

gcc/ChangeLog:
PR other/116792
* diagnostic-format-html.cc (html_token_printer::print_tokens):
Handle pp_token::kind::event_id.
(selftest::test_token_printer): Add coverage of printing an event
id.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vsaddu.vv combine to
vsaddu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vsaddu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vsaddu.vv
combine to vsaddu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vsaddu.vv to the
vsaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  T sat_add(T a, T b)
  {
    return (a + b) | (-(T)((T)(a + b) < a));
  }

  DEF_VX_BINARY(uint32_t, sat_add)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vsaddu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vsaddu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case US_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op us_plus.

Signed-off-by: Pan Li <pan2.li@intel.com>

tailc: Allow musttail tail calls with -fsanitize=address [PR120608]

These testcases show another problem with -fsanitize=address
vs. musttail tail calls.  In particular, there can be
  .ASAN_MARK (POISON, &a, 4);
etc. calls after a tail call and those just prevent the tailc pass
to mark the musttail calls as [tail call].
Normally, the sanopt pass (which comes after tailc) will optimize those
away, the optimization is if there are no .ASAN_CHECK calls or normal
function calls dominated by those .ASAN_MARK (POSION, ...) calls, the
poison is not needed, because in the epilog sequence (the one dealt with
in the patch posted earlier today) all the stack slots are unpoisoned anyway
(or poisoned for use-after-return).
Unlike __builtin_tsan_exit_function, .ASAN_MARK is not a real function
and is always expanded inline, so can be never tail called successfully,
so the patch just ignores those for the cfun->has_musttail && diag_musttail
cases.  If there is a non-musttail call, it will fail worst case during
expansion because there is the epilog asan sequence.

2025-06-12  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* tree-tailcall.cc (empty_eh_cleanup): Ignore .ASAN_MARK (POISON)
internal calls for the cfun->has_musttail case and diag_musttail.
(find_tail_calls): Likewise.

* c-c++-common/asan/pr120608-1.c: New test.
* c-c++-common/asan/pr120608-2.c: New test.

expand: Allow musttail tail calls with -fsanitize=address [PR120608]

The following testcase is rejected by GCC 15 but accepted (with
s/gnu/clang/) by clang.
The problem is that we want to execute a sequence of instructions to
unpoison all automatic variables in the function and mark the var block
allocated for use-after-return sanitization poisoned after the call,
so we were just disabling tail calls if there are any instructions
returned from asan_emit_stack_protection.
It is fine and necessary for normal tail calls, but for musttail
tail calls we actually document that accessing the automatic vars of
the caller is UB as if they end their lifetime right before the tail
call, so we also want address sanitizer user-after-return to diagnose
that.

The following patch will only disable normal tail calls when that sequence
is present, for musttail it will arrange to emit a copy of that sequence
before the tail call sequence.  That sequence only tweaks the shadow memory
and nothing in the code emitted by call expansion should touch the shadow
memory, so it is ok to emit it already before argument setup.

2025-06-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120608
* cfgexpand.cc: Include rtl-iter.h.
(expand_gimple_tailcall): Add ASAN_EPILOG_SEQ argument, if non-NULL
and expand_gimple_stmt emitted a tail call, emit a copy of that
insn sequence before the call sequence.
(expand_gimple_basic_block): Remove DISABLE_TAIL_CALLS argument, add
ASAN_EPILOG_SEQ argument.  Disable tail call flag only on non-musttail
calls if that flag is set, pass it to expand_gimple_tailcall.
(pass_expand::execute): Pass VAR_RET_SEQ directly as last
expand_gimple_basic_block argument rather than its comparison with
NULL.

* g++.dg/asan/pr120608.C: New test.

vect: Use combined peeling and versioning for mutually aligned DRs

Current GCC uses either peeling or versioning, but not in combination,
to handle unaligned data references (DRs) during vectorization. This
limitation causes some loops with early break to fall back to scalar
code at runtime.

Consider the following loop with DRs in its early break condition:

for (int i = start; i < end; i++) {
  if (a[i] == b[i])
    break;
  count++;
}

In the loop, references to a[] and b[] need to be strictly aligned for
vectorization because speculative reads that may cross page boundaries
are not allowed. Current GCC does versioning for this loop by creating a
runtime check like:

((&a[start] | &b[start]) & mask) == 0

to see if two initial addresses both have lower bits zeros. If above
runtime check fails, the loop will fall back to scalar code. However,
it's often possible that DRs are all unaligned at the beginning but they
become all aligned after a few loop iterations. We call this situation
DRs being "mutually aligned".

This patch enables combined peeling and versioning to avoid loops with
mutually aligned DRs falling back to scalar code. Specifically, the
function vect_peeling_supportable is updated in this patch to return a
three-state enum indicating how peeling can make all unsupportable DRs
aligned. In addition to previous true/false return values, a new state
peeling_maybe_supported is used to indicate that peeling may be able to
make these DRs aligned but we are not sure about it at compile time. In
this case, peeling should be combined with versioning so that a runtime
check will be generated to guard the peeled vectorized loop.

A new type of runtime check is also introduced for combined peeling and
versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
have the same lower address bits. For above loop case, the new test will
generate an XOR between two addresses, like:

((&a[start] ^ &b[start]) & mask) == 0

Therefore, if a and b have the same alignment step (element size) and
the same offset from an alignment boundary, a peeled vectorized loop
will run. This new runtime check also works for >2 DRs, with the LHS
expression being:

((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask

where ai is the address of i'th DR.

This patch is bootstrapped and regression tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_peeling_supportable): Return new
enum values to indicate if combined peeling and versioning can
potentially support vectorization.
(vect_enhance_data_refs_alignment): Support combined peeling and
versioning in vectorization analysis.
* tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
Add a new type of runtime check for mutually aligned DRs.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
default value of allow_mutual_alignment in the initializer list.
* tree-vectorizer.h (enum peeling_support): Define type of
peeling support for function vect_peeling_supportable.
(LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.

match: Simplify doubled not, negate and conjugate operators to a non-lvalue

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`, `conj(conj(X))`): Add a
NON_LVALUE_EXPR wrapper to the simplification of doubled unary
operators NEGATE_EXPR, BIT_NOT_EXPR and CONJ_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.

tree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds

The testcase in this PR shows, on the GCC 14 branch, that in some
degenerate cases we can spend exponential time pruning always
initialized paths through a web of PHIs. The following adds
--param uninit-max-prune-work, defaulted to 100000, to limit that
to effectively O(1).

PR tree-optimization/120729
* gimple-predicate-analysis.h (uninit_analysis::prune_phi_opnds):
Add argument of work budget remaining.
* gimple-predicate-analysis.cc (uninit_analysis::prune_phi_opnds):
Likewise. Maintain and honor it throughout the recursion.
* params.opt (uninit-max-prune-work): New.
* doc/invoke.texi (uninit-max-prune-work): Document.

vregs: Use force_subreg when instantiating subregs [PR120721]

In this PR, we started with:

    (subreg:V2DI (reg:DI virtual-reg) 0)

and vregs instantiated the virtual register to the argument pointer.
But:

    (subreg:V2DI (reg:DI ap) 0)

is not a sensible subreg, since the argument pointer certainly can't
be referenced in V2DImode.  This is (IMO correctly) rejected after
g:2dcc6dbd8a00caf7cfa8cac17b3fd1c33d658016.

The vregs code that instantiates the subreg above is specific to
rvalues and already creates new instructions for nonzero offsets.
It is therefore safe to use force_subreg instead of simplify_gen_subreg.

I did wonder whether we should instead say that a subreg of a
virtual register is invalid if the same subreg would be invalid
for the associated hard registers.  But the point of virtual registers
is that the offsets from the hard registers are not known until after
expand has finished, and if an offset is nonzero, the virtual register
will be instantiated into a pseudo that contains the sum of the hard
register and the offset.  The subreg would then be correct for that
pseudo.  The subreg is only invalid in this case because there is
no offset.

gcc/
PR rtl-optimization/120721
* function.cc (instantiate_virtual_regs_in_insn): Use force_subreg
instead of simplify_gen_subreg when instantiating an rvalue SUBREG.

gcc/testsuite/
PR rtl-optimization/120721
* g++.dg/torture/pr120721.C: New test.

x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

Don't use vmovdqu16/vmovdqu8 with non-EVEX register operands just because
AVX512BW is available.

gcc/

PR target/120728
* config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8
only with EVEX register operands.

gcc/testsuite/

PR target/120728
* gcc.target/i386/avx512bw-vmovdqu16-1.c: Scan vmovdqu for
non-EVEX register operands.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr95483-5.c: Likewise.
* gcc.target/i386/pr120728.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Add PROCESSOR_XXX comments to processor_cost_table

Add a PROCESSOR_XXX comment to each entry in processor_cost_table to
describe which processor the cost enry is applied to.

* config/i386/i386-options.cc (processor_cost_table): Add a
PROCESSOR_XXX comment to each entry.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

Ada: Replace hardcoded GNAT commands for GNAT tools

This replaces the hardcoded gnat{make,link,bind,ls} commands with expansion
of the GNAT{MAKE,BIND} variables computed by the configure machinery, during
the build of the GNAT tools.

The default GNATMAKE_FOR_HOST duplicates the default GNATMAKE, and someone
setting GNATMAKE in the toplevel configuration may want it applied for all
host compilations. Direct assignment of GNATMAKE_FOR_HOST keeps working.

gcc/ada/
PR ada/120106
* gcc-interface/Make-lang.in: Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST
from GNAT{MAKE,BIND} instead of using hardcoded commands.
gnattools/
PR ada/120106
* configure.ac: Remove ACX_NONCANONICAL_HOST and add ACX_PROG_GNAT.
* configure: Regenerate.
* Makefile.in: Do not substitute host_noncanonical but substitute
GNATMAKE and GNATBIND.
Set GNAT{MAKE,BIND,LINK_LS}_FOR_HOST from GNAT{MAKE,BIND} instead
of using hardcoded commands.

Ada: Remove obsolete stuff in Makefile fragment

gcc/ada/
* Make-generated.in: Remove obsolete stuff.

Ada: Introduce GNATMAKE_FOR_BUILD Makefile variable

This gets rid of the hardcoded 'gnatmake' command used during the build.

/
PR ada/120106
* Makefile.tpl: Add GNATMAKE_FOR_BUILD to {HOST,BASE_TARGET}_EXPORTS
* Makefile.in: Regenerate.
* configure.ac: Set the default and substitute the variable.
* configure: Regenerate.
gcc/ada/
PR ada/120106
* Make-generated.in: Use GNATMAKE_FOR_BUILD instead of gnatmake.
* gcc-interface/Makefile.in: Likewise.

[RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hosts

So this is Andrew's patch from the PR. We weren't clean for a 32bit host in
some of the arithmetic for constant synthesis.

I confirmed the bug on a 32bit linux host, then confirmed that Andrew's patch
from the PR fixes the problem, then ran Andrew's patch through my tester
successfully.

Naturally I'll wait for pre-commit testing, but I'm not expecting problems.

PR target/119830
gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Make arithmetic in bclr case
clean for 32 bit hosts.

gcc/testsuite/
* gcc.target/riscv/pr119830.c: New test.

[committed][PR rtl-optimization/120550] Drop REG_EQUAL note after ext-dce transformation

This bug was found by Edwin's fuzzing efforts on RISC-V, though it likely
affects other targets.

In simplest terms when ext-dce converts an extension into a (possibly
simplified) subreg copy it may make an attached REG_EQUAL note invalid.

In the case Edwin found the note was an extension, but I don't think that would
necessarily always be the case.  The note could have other forms which
potentially need invalidation.  So the safest thing to do is just remove any
attached REG_EQUAL or REG_EQUIV note.

Note adjusting Edwin's testcase in the obvious way to avoid having to interpret
printf output for pass/fail status makes the bug go latent.  That's why no
testcase is included with this patch.

Bootstrapped and regression tested on x86_64.  Obviously also verified it fixes
the testcase Edwin filed.

This is a good candidate for cherry-picking to the gcc-15 release branch after
simmering on the trunk a bit.

PR rtl-optimization/120550
gcc/
* ext-dce.cc (ext_dce_try_optimize_insn): Drop REG_EQUAL/REG_EQUIV
notes on modified insns.

xtensa: Make use of DEPBITS instruction

This patch implements bitfield insertion MD pattern using the DEPBITS
machine instruction, the counterpart of the EXTUI instruction, if
available.

     /* example */
     struct foo {
       unsigned int b:10;
       unsigned int r:11;
       unsigned int g:11;
     };
     void test(struct foo *p) {
       p->g >>= 1;
     }

     ;; result (endianness: little)
     test:
      entry sp, 32
      l32i.n a8, a2, 0
      extui a9, a8, 1, 10
      depbits a8, a9, 0, 11
      s32i.n a8, a2, 0
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa.h (TARGET_DEPBITS): New macro.
* config/xtensa/xtensa.md (insvsi): New insn pattern.

xtensa: Implement TARGET_ZERO_CALL_USED_REGS

This patch implements the target-specific ZERO_CALL_USED_REGS hook, since
if -fzero-call-used-regs=all the default hook tries to assign 0 to B0
(bit 0 of the BR register) and the ICE will be thrown.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_zero_call_used_regs):
New prototype and function.
(TARGET_ZERO_CALL_USED_REGS): Define macro.

Fix some problems with afdo propagation

This patch fixes problems I noticed by exploring profiles of some hot
functions in GCC. In particular the propagation sometimes changed
precise 0 to afdo 0 for paths calling abort and sometimes we could
propagate more when we accept that some paths has 0 count.
Finally there was important bug in computing all_known which
resulted in BB probabilities to be quite broken after afdo.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* auto-profile.cc (update_count_by_afdo_count): Make static;
add variant accepting profile_count.
(afdo_find_equiv_class): Use update_count_by_afdo_count.
(afdo_propagate_edge): Likewise.
(afdo_propagate): Likewise.
(afdo_calculate_branch_prob): Fix handling of all_known.
(afdo_annotate_cfg): Annotate by 0 where both afdo and static
profile agrees.