git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Use riscv_2x_xlen_mode_p [NFC]

Use riscv_v_ext_mode_p to check the mode size is 2x XLEN, instead of
using "(GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))".

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Use
riscv_2x_xlen_mode_p.
(riscv_binary_cost): Ditto.
(riscv_hard_regno_mode_ok): Ditto.

RISC-V: Adding cost model for zilsd

Motivation of this patch is we want to use ld/sd if possible when zilsd
is enabled, however the subreg pass may split that into two lw/sw
instructions because the cost, and it only check cost for 64 bits reg move,
that's why we need adjust cost for 64 bit reg move as well.

However even we adjust the cost model, 64 bit shift still use 32 bit
load because it already got split at expand time, this may need to fix
on the expander side, and this apparently need few more time to
investigate, so I just added a testcase with XFAIL to show the current behavior,
and we can fix that...when we have time.

For long term, we may adding a new field to riscv_tune_param to control
the cost model for that.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_cost_model): Add cost model for
zilsd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zilsd-code-gen-split-subreg-1.c: New test.
* gcc.target/riscv/zilsd-code-gen-split-subreg-2.c: New test.

x86: Fix shrink wrap separate ICE under -fstack-clash-protection [PR120697]

gcc/ChangeLog:

PR target/120697
* config/i386/i386.cc (ix86_expand_prologue):
Remove 3 assertions and associated code.

gcc/testsuite/ChangeLog:

PR target/120697
* gcc.target/i386/stack-clash-protection.c: New test.

Daily bump.

analyzer: make checker_event::m_kind private

No functional change intended.

gcc/analyzer/ChangeLog:
* checker-event.h (checker_event::get_kind): New accessor.
(checker_event::m_kind): Make private.
* checker-path.cc (checker_path::maybe_log): Use accessor for
checker_event::m_kind.
(checker_path::add_event): Likewise.
(checker_path::debug): Likewise.
(checker_path::cfg_edge_pair_at_p): Likewise.
(checker_path::inject_any_inlined_call_events): Likewise.
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::consolidate_conditions): Likewise.
(diagnostic_manager::consolidate_unwind_events): Likewise.
(diagnostic_manager::finish_pruning): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Add space after foo in testcase

gcc/testsuite/
* gcc.dg/pr119039-1.c: Add space in search criteria.

emit-rtl: Use simplify_subreg_regno to validate hardware subregs [PR119966]

PR119966 showed that combine could generate unfoldable hardware subregs
for pru-unknown-elf.  To fix, strengthen the checks performed by
validate_subreg.

The simplify_subreg_regno performs more validity checks than
the simple info.representable_p.  Most importantly, the
targetm.hard_regno_mode_ok hook is called to ensure the hardware
register is valid in subreg's outer mode.  This fixes the rootcause
for PR119966.

The checks for stack-related registers are bypassed because the i386
backend generates them, in this seemingly valid peephole optimization:

   ;; Attempt to always use XOR for zeroing registers (including FP modes).
   (define_peephole2
     [(set (match_operand 0 "general_reg_operand")
           (match_operand 1 "const0_operand"))]
     "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD
      && (! TARGET_USE_MOV0 || optimize_insn_for_size_p ())
      && peep2_regno_dead_p (0, FLAGS_REG)"
     [(parallel [(set (match_dup 0) (const_int 0))
                 (clobber (reg:CC FLAGS_REG))])]
     "operands[0] = gen_lowpart (word_mode, operands[0]);")

Testing done:
  * No regressions were detected for C and C++ on x86_64-pc-linux-gnu.
  * "contrib/compare-all-tests i386" showed no difference in code
    generation.
  * No regressions for pru-unknown-elf.
  * Reverted r16-809-gf725d6765373f7 to expose the now latent PR119966.
    Then ensured pru-unknown-elf build is ok.  Only two cases regressed
    where rnreg pass transforms a valid hardware subreg into invalid
    one.  But I think that is not related to combine's PR119966:
      gcc.c-torture/execute/20040709-1.c
      gcc.c-torture/execute/20040709-2.c

PR target/119966

gcc/ChangeLog:

* emit-rtl.cc (validate_subreg): Call simplify_subreg_regno
instead of checking info.representable_p..
* rtl.h (simplify_subreg_regno): Add new argument
allow_stack_regs.
* rtlanal.cc (simplify_subreg_regno): Do not reject
stack-related registers if allow_stack_regs is true.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

c++, coroutines: CWG2563 promise lifetime extension [PR115908].

This implements the final piece of the revised CWG2563 wording;
"It exits the scope of promise only if the coroutine completed
without suspending."

Considering the coroutine to be made up of two components; a
'ramp' and a 'body' where the body represents the user's original
code and the ramp is responsible for setup of that and for
returning some object to the original caller.

Coroutine state, and responsibility for its release.

A coroutine has some state that persists across suspensions.

The state has two components:
  * State that is specified by the standard and persists for the entire
    life of the coroutine.
  * Local state that is constructed/destructed as scopes in the original
    function body are entered/exited.  The destruction of local state is
    always the responsibility of the body code.

The persistent state (and the overall storage for the state) must be
managed in two places:
  * The ramp function (which allocates and builds this - and can, in some
    cases, be responsible for destroying it)
  * The re-written function body which can destroy it when that body
    completes its final suspend - or when the handle.destroy () is called.

In all cases the ramp holds responsibility for constructing the standard-
mandated persistent state.

There are four ways in which the ramp might be re-entered after starting
the function body:
  A The body could suspend (one might expect that to be the 'normal' case
    for most coroutines).
  B The body might complete either synchronously or via continuations.
  C An exception might be thrown during the setup of the initial await
    expression, before the initial awaiter resumes.
  D An exception might be processed by promise.unhandled_exception () and
    that, in turn, might re-throw it (or throw something else).  In this
    case, the coroutine is considered suspended at the final suspension
    point.

Once the coroutine has passed initial suspend (i.e. the initial awaiter
await_resume() has been called) the body is considered to have a use of
the state.

Until the ramp return value has been constructed, the ramp is considered
to have a use of the state.

To manage these interacting conditions we allocate a reference counter
for the frame state.  This is initialised to 1 by the ramp as part of its
startup (note that failures/exceptions in the startup code are handled
locally to the ramp).

When the body returns (either normally, or by exception) the ramp releases
its use.

Once the rewritten coroutine body is started, the body is considered to
have a use of the frame.  This use (potentially) needs to be released if
an exception is thrown from the body.  We implement this using an eh-only
cleanup around the initial await.  If we have the case D above, then we
do not release the body use.

In case:

  A, typically the ramp would be re-entered with the body holding a use,
  and therefore the ramp should not destroy the state.

  B, both the body and ramp will have released their uses, and the ramp
  should destroy the state.

  C, we must arrange for the body to release its use, because we require
  the ramp to cleanup in this circumstance.

  D is an outlier, since the responsibility for destruction of the state
  now rests with the user's code (via a handle.destroy() call).

  NOTE: In the case that the body has never suspended before such an
  exception occurs, the only reasonable way for the user code to obtain the
  necessary handle is if unhandled_exception() throws the handle or some
  object that contains the handle.  That is outside of the designs here -
  if the user code might need this corner-case, then such provision will
  have to be made.

In the ramp, we implement destruction for the persistent frame state by
means of cleanups.  These are run conditionally when the reference count
is 0 signalling that both the body and the ramp have completed.

In the body, once we pass the final suspend, then we test the use and
delete the state if the use is 0.

PR c++/115908
PR c++/118074
PR c++/95615

gcc/cp/ChangeLog:

* coroutines.cc (coro_frame_refcount_id): New.
(coro_init_identifiers): Initialise coro_frame_refcount_id.
(build_actor_fn): Set up initial_await_resume_called.  Handle
decrementing of the frame reference count.  Return directly to
the caller if that is non-zero.
(cp_coroutine_transform::wrap_original_function_body): Use a
conditional eh-only cleanup around the initial await expression
to release the body use on exception before initial await
resume.
(cp_coroutine_transform::build_ramp_function): Wrap the called
body in a cleanup that releases a use of the frame when we
return to the ramp.  Implement frame, promise and argument copy
destruction via conditional cleanups when the frame use count
is zero.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Move to...
* g++.dg/coroutines/torture/pr115908.C: ...here.
* g++.dg/coroutines/torture/pr95615-02.C: Move to...
* g++.dg/coroutines/torture/pr95615-01-promise-ctor-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-03.C: Move to...
* g++.dg/coroutines/torture/pr95615-02-get-return-object-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-01.C: Move to...
* g++.dg/coroutines/torture/pr95615-03-initial-suspend-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-04.C: Move to...
* g++.dg/coroutines/torture/pr95615-04-initial-await-ready-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615-05.C: Move to...
* g++.dg/coroutines/torture/pr95615-05-initial-await-suspend-throws.C: ...here.
* g++.dg/coroutines/torture/pr95615.inc: Add more cases and ensure that the
code completes properly when no exceptions are thrown.
* g++.dg/coroutines/torture/pr95615-00-nothing-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-06-initial-await-resume-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-07-body-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-08-initial-suspend-throws-uhe-throws.C: New test.
* g++.dg/coroutines/torture/pr95615-09-body-throws-uhe-throws.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

intersect_bitmask - Always update bitmask.

The bitmask wasn't always being updated, resulting in some less than
perfect masks being stored.:x

* value-range.cc (irange::intersect_bitmask): Always update the
stored mask to reflect the current calculated mask.

Improve contains_p and intersect with bitmasks.

Improve the way contains_p (wide_int) and intersect behave wioth
singletons and bitmasks. Also fix a buglet in bitmask_intersect when the
result is a singleton which is not in the current range.

PR tree-optimization/119039
gcc/
* value-range.cc (irange::contains_p): Call wide_int version of
contains_p for singleton ranges.
(irange::intersect): If either range is a singleton, use
contains_p.

gcc/testsuite/
* gcc.dg/pr119039-2.c: New.

Simplify switches utilizing subranges.

Adjust simplify_switch_using_ranges to use irange rather than relying
on the older legacy_range mechaism.

PR tree-optimization/119039
gcc/
* vr-values.cc (simplify_using_ranges::legacy_fold_cond): Remove.
(simplify_using_ranges::simplify_switch_using_ranges): Adjust.

gcc/testsuite/
* gcc.dg/pr119039-1.c: New.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust thread counts.

Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics [PR82480]

The GNU intrinsics STAT/LSTAT/FSTAT were inherited from g77, but changed
the names of some keywords: FILE became NAME, and SARRAY became VALUES,
which are the keywords documented in the gfortran manual.
Adjust code and libgfortran error messages to reflect this change.
Furthermore, add compile-time checking that INTENT(OUT) arguments are
definable, and that array VALUES has at least size 13.
Document that integer arguments are of default kind, and that overflows
in conversion to integer return -1 in VALUES.

PR fortran/82480

gcc/fortran/ChangeLog:

* check.cc (gfc_check_fstat): Extend checks to INTENT(OUT) arguments.
(gfc_check_fstat_sub): Likewise.
(gfc_check_stat): Likewise.
(gfc_check_stat_sub): Likewise.
* intrinsic.texi: Adjust documentation.

libgfortran/ChangeLog:

* intrinsics/stat.c (stat_i4_sub_0): Fix argument names. Rename
SARRAY to VALUES also in error message. When array VALUES is
KIND=4, get only stat components that do not overflow INT32_MAX,
otherwise set the corresponding VALUES elements to -1.
(stat_i4_sub): Fix argument names.
(lstat_i4_sub): Likewise.
(stat_i8_sub_0): Likewise.
(stat_i8_sub): Likewise.
(lstat_i8_sub): Likewise.
(stat_i4): Likewise.
(stat_i8): Likewise.
(lstat_i4): Likewise.
(lstat_i8): Likewise.
(fstat_i4_sub): Likewise.
(fstat_i8_sub): Likewise.
(fstat_i4): Likewise.
(fstat_i8): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/stat_3.f90: New test.

cobol: Correct diagnostic strings to rectify bootstrap build

Apply patch from Jakub to enable diagnostics. Use %<%> and %qs liberally.

PR cobol/120621

gcc/cobol/ChangeLog:

* cbldiag.h (yyerror): Add diagnostic attributes.
(yywarn): Same.
(error_msg): Same.
(yyerrorvl): Same.
(cbl_unimplementedw): Same.
(cbl_unimplemented): Same.
(cbl_unimplemented_at): Same.
* cdf-copy.cc (copybook_elem_t::open_file): Supply string argument.
* cdf.y: Use %<%>.
* cobol-system.h (if): Check GCC_VERSION.
(ATTRIBUTE_GCOBOL_DIAG): Define.
* except.cc (cbl_enabled_exception_t::dump): Remove extra %s.
* genapi.cc (get_class_condition_string): Use acceptable message.
(get_bytes_needed): Same.
(move_tree): Same.
(get_string_from): Same.
(internal_perform_through): Same.
(tree_type_from_field_type): Same.
(is_valuable): Same.
(parser_logop): Same.
(parser_relop): Same.
(parser_relop_long): Same.
(parser_if): Same.
(parser_setop): Same.
(parser_perform_conditional): Same.
(parser_file_add): Same.
(parser_file_open): Same.
(parser_file_close): Same.
(parser_file_read): Same.
(parser_file_write): Same.
(inspect_replacing): Same.
(parser_sort): Same.
(parser_file_sort): Same.
(parser_file_merge): Same.
(create_and_call): Same.
(parser_bitop): Same.
(parser_bitwise_op): Same.
(hijack_for_development): Same.
(mh_source_is_literalN): Same.
(mh_dest_is_float): Same.
(parser_symbol_add): Same.
* gengen.cc (show_type): Use acceptable message.
(gg_find_field_in_struct): Same.
(gg_declare_variable): Same.
(gg_printf): Same.
(gg_fprintf): Same.
(gg_tack_on_function_parameters): Same.
(gg_define_function): Same.
(gg_get_function_decl): Same.
(gg_finalize_function): Same.
(gg_call_expr): Same.
(gg_call): Same.
(gg_insert_into_assembler): Define new function.
(gg_insert_into_assemblerf): Use gg_insert_into_assembler().
* gengen.h (gg_insert_into_assembler): Simpler function declaration.
(gg_insert_into_assemblerf): Declare new function.
* genmath.cc (parser_op): Use acceptable message.
* genutil.cc (get_binary_value): Use acceptable message.
* lexio.cc (parse_replacing_pair): Correct diagnostic arguments.
(preprocess_filter_add): Same.
(cdftext::open_input): Same.
* parse.y: Use acceptable messages.
* parse_ante.h (struct evaluate_elem_t): Use %<%>.
(is_callable): Same.
* parse_util.h (intrinsic_invalid_parameter): Use %qs.
* scan.l: Use dialect_error().
* scan_ante.h (numstr_of): Use %qs.
(scanner_token): Quote COBOL tokens in messages.
(scanner_parsing): Correct diagnostic message.
(scanner_parsing_toggle): Quote COBOL tokens in messages.
(scanner_parsing_pop): Same.
(typed_name): Use %qs.
* scan_post.h (prelex): Quote COBOL tokens in message.
* show_parse.h (CHECK_FIELD): Use acceptable message format.
(CHECK_LABEL): Same.
* symbols.cc (symbol_field_same_as): Remove extra spaces.
(cbl_alphabet_t::assign): Use %<%>.
(cbl_field_t::internalize): Quote library name in message.
* symbols.h (struct os_locale_t): Constify codeset.
(class temporaries_t): Add copy constructor.
(struct cbl_alphabet_t): Use acceptable message.
* util.cc (symbol_type_str): Use cbl_internal_error.
(cbl_field_type_str): Same.
(is_elementary): Same.
(cbl_field_t::report_invalid_initial_value): Use %qs.
(class unique_stack): Avoid %m.
(ydferror): Declare function with attributes.
(error_msg): Same.
(cobol_fileline_set): Use %<%>.
(os_locale_t): Remove use of xstrdup.
(cobol_parse_files): Quote C names in message.
(dialect_error): Use %<%>.
* util.h (cbl_message): Add attributes.
(cbl_internal_error): Same.
(cbl_err): Same.
(cbl_errx): Same.

Fix dump_function_to_file use of dump_flags

The function gets dump flags as 'flags' parameter, so shouldn't use
dump_flags.

* tree-cfg.cc (dump_function_to_file): Use flags, not dump_flags.

doc: allow gcov.texi to be processed by makeinfo 4.13

As per documentation, even 4.7 ought to suffice. At least 4.13 objects
to there being a blank between @anchor and the opening curly brace.

gcc/

* doc/gcov.texi: Drop blank after @anchor.

doc: allow extend.texi to be processed by makeinfo 4.13

PR middle-end/120544

As per documentation, even 4.7 ought to suffice. At least 4.13 objects
to there being nothing ahead of the first comma in @xref{}.

gcc/

* doc/extend.texi: Fill first argument of @xref{}.

dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and large _BitInt [PR120631]

The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.

The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).

2025-06-18 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.

* gcc.dg/dfp/bitint-9.c: New test.

RISC-V: Add test for vec_duplicate + vmin.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmin.vv combine to
vmin.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vmin.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmin.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vmin.vv
combine to vmin.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmin-run-2-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmin.vv to vmin.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmin.vv to the
vmin.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)                                      \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = FUNC (in[i], x);                                       \
  }

  int32_t min(int32 a, int32 b)
  {
    return a > b ? b : a;
  }

  DEF_VX_BINARY(int32_t, min)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vmin.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vmin.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case SMIN.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op smin.

Signed-off-by: Pan Li <pan2.li@intel.com>

x86: Enable separate shrink wrapping

This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
enable separate shrink wrapping for function prologues/epilogues in
x86.

When performing separate shrink wrapping, we choose to use mov instead
of push/pop, because using push/pop is more complicated to handle rsp
adjustment and may lose performance, so here we choose to use mov, which
has a small impact on code size, but guarantees performance.

Using mov means we need to use sub/add to maintain the stack frame. In
some special cases, we need to use lea to prevent affecting EFlags.

Avoid inserting sub between test-je-jle to change EFlags, lea should be
used here.

    foo:
        xorl    %eax, %eax
        testl   %edi, %edi
        je      .L11
        sub     $16, %rsp  ------> leaq    -16(%rsp), %rsp
        movq    %r13, 8(%rsp)
        movl    $1, %r13d
        jle     .L4

Tested against SPEC CPU 2017, this change always has a net-positive
effect on the dynamic instruction count.  See the following table for
the breakdown on how this reduces the number of dynamic instructions
per workload on a like-for-like (with/without this commit):

instruction count       base            with commit (commit-base)/commit
502.gcc_r         98666845943 96891561634 -1.80%
526.blender_r         6.21226E+11 6.12992E+11 -1.33%
520.omnetpp_r         1.1241E+11 1.11093E+11 -1.17%
500.perlbench_r 1271558717 1263268350 -0.65%
523.xalancbmk_r         2.20103E+11 2.18836E+11 -0.58%
531.deepsjeng_r         2.73591E+11 2.72114E+11 -0.54%
500.perlbench_r    64195557393 63881512409 -0.49%
541.leela_r         2.99097E+11 2.98245E+11 -0.29%
548.exchange2_r         1.27976E+11 1.27784E+11 -0.15%
527.cam4_r         88981458425 88887334679 -0.11%
554.roms_r         2.60072E+11 2.59809E+11 -0.10%

Collected spec2017 performance on ZNVER5, EMR and ICELAKE. No performance regression was observed.

For O2 multi-copy :
511.povray_r improved by 2.8% on ZNVER5.
511.povray_r improved by 4% on EMR
511.povray_r improved by 3.3 % ~ 4.6% on ICELAKE.

gcc/ChangeLog:

* config/i386/i386-protos.h (ix86_get_separate_components):
New function.
(ix86_components_for_bb): Likewise.
(ix86_disqualify_components): Likewise.
(ix86_emit_prologue_components): Likewise.
(ix86_emit_epilogue_components): Likewise.
(ix86_set_handled_components): Likewise.
* config/i386/i386.cc (save_regs_using_push_pop):
Split from ix86_compute_frame_layout.
(ix86_compute_frame_layout):
Use save_regs_using_push_pop.
(pro_epilogue_adjust_stack):
Use gen_pro_epilogue_adjust_stack_add_nocc.
(ix86_expand_prologue): Add some assertions and adjust
the stack frame at the beginning of the prolog for shrink
wrapping separate.
(ix86_emit_save_regs_using_mov):
Skip registers that are wrapped separately.
(ix86_emit_restore_regs_using_mov): Likewise.
(ix86_expand_epilogue): Add some assertions and set
restore_regs_via_mov to true for shrink wrapping separate.
(ix86_get_separate_components): New function.
(ix86_components_for_bb): Likewise.
(ix86_disqualify_components): Likewise.
(ix86_emit_prologue_components): Likewise.
(ix86_emit_epilogue_components): Likewise.
(ix86_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
* config/i386/i386.h (struct machine_function):Add
reg_is_wrapped_separately array for register wrapping
information.
* config/i386/i386.md
(@pro_epilogue_adjust_stack_add_nocc<mode>): New.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/callabi/leaf-2.c: Adjust the test.
* gcc.target/i386/interrupt-16.c: Likewise.
* gfortran.dg/guality/arg1.f90: Likewise.
* gcc.target/i386/avx10_2-comibf-1.c: Likewise.
* g++.target/i386/shrink_wrap_separate.C: New test.
* gcc.target/i386/shrink_wrap_separate_check_lea.c: Likewise.

Co-authored-by: Michael Matz <matz@suse.de>

Snap subrange boundries to bitmask constraints.

Ensure all subrange endpoints conform to the bitmask.

PR tree-optimization/120661
gcc/
* value-range.cc (irange::snap): New.
(irange::snap_subranges): New.
(irange::set_range_from_bitmask): Call snap_subranges.
* value-range.h (snap, snap_subranges): New prototypes.

gcc/testsuite/
* gcc.dg/pr120661-1.c: New.
* gcc.dg/pr120661-2.c: New.

Daily bump.

c++, coroutines: Remove use of coroutine handle in the frame.

We have been keeping a copy of coroutine_handle<promise> in the state
frame, as it was expected to be efficient to use this to initialize the
argument to await_suspend. This does not turn out to be the case and
intializing the value is obstructive to CGW2563 fixes. This removes
the use.

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Update comments.
(struct coro_aw_data): Remove self_handle and add in
information to create the handle in lowering.
(expand_one_await_expression): Build a temporary coroutine
handle.
(build_actor_fn): Remove reference to the frame copy of the
coroutine handle.
(cp_coroutine_transform::wrap_original_function_body): Remove
reference to the frame copy of the coroutine handle.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Ada: Fix assertion failure on problematic container aggregate

This is an assertion failure on code using a container aggregate in the
primitives referenced by the Aggregate aspect, which cannot work.

gcc/ada/
PR ada/120665
* sem_aggr.adb (Resolve_Container_Aggregate): Use robust guards.

gcc/testsuite/
* gnat.dg/specs/aggr8.ads: New test.

PR modula2/120673: Mutually dependent types crash the compiler

This patch fixes an ICE which will occur if cyclic dependent types
are used when declaring a variable. This patch detects the
cyclic dependency and issues an error message for each outstanding
component.

gcc/m2/ChangeLog:

PR modula2/120673
* gm2-compiler/M2GCCDeclare.mod (ErrorDepList): New
global variable set containing every errant dependency symbol.
(mystop): Remove.
(EmitCircularDependancyError): Replace with ...
(EmitCircularDependencyError): ... this.
(AssertAllTypesDeclared): Rewrite.
(DoVariableDeclaration): Ditto.
(TypeDependentsDeclared): New procedure function.
(PrepareGCCVarDeclaration): Ditto.
(DeclareVariable): Remove assert.
(DeclareLocalVariable): Ditto.
(Constructor): Initialize ErrorDepList.
* gm2-compiler/M2MetaError.mod (doErrorScopeProc): Rewrite
and ensure that a symbol with a module scope does not lookup
from a definition module.
* gm2-compiler/P2SymBuild.mod (BuildType): Rewrite so that
a synonym type is created using the token refering to the name
on the lhs.

gcc/testsuite/ChangeLog:

PR modula2/120673
* gm2/pim/fail/badmodvar.mod: New test.
* gm2/pim/fail/cyclictypes.mod: New test.
* gm2/pim/fail/cyclictypes2.mod: New test.
* gm2/pim/fail/cyclictypes4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Improve static and AFDO profile combination

This patch makes afdo_adjust_guessed_profile more agressive on finding scales
on the boundaries of connected components with no annotation.  Originaly I
looked for edges into or out of the component with known AFDO counts and I also
haled edges from basic block with known AFDO count and known static probability
estimate.

Common problem is with components not containing any in edges, but only out
edges (i.e.  those with ENTRY_BLOCK).  In this case I added logic that looks
for edges out of the component to BBs with known AFDO count.  If all flow to
the BB is either from the component or has AFDO count, we can deterine scale
precisely.  It may happen that there are edges from other components. In this
case we know upper bound and use it, since it is better than nothing.

I also noticed that some components have 0 count in all profile and then scaling
gives up, which is fixed.  I also optimized the code a bit by replacing
map holding current component with an array holding component ID and broke out
saling logic into separate functions.

The patch fixes perl regression I introduced in last change.
according to
https://lnt.opensuse.org/db_default/v4/SPEC/67674
there were improvements (percentage is runtime change):

538.imagick_r -32.52%
549.fotonik3d_r -22.68%
520.omnetpp_r -12.37%
503.bwaves_r -8.71%
508.namd_r -5.10%
526.blender_r -2.11%

and regressions:

554.roms_r 45.95%
527.cam4_r 21.69%
511.povray_r 13.59%
500.perlbench_r 10.19%
507.cactuBSSN_r 9.81%
510.parest_r 9.69%
548.exchange2_r 8.42%
502.gcc_r 5.10%
544.nab_r 3.76%
519.lbm_r 2.34%
541.leela_r 2.16%
525.x264_r 2.14%

This is a bit wild, but hope things will settle donw once we chase out
obvious problems (such as losing the profile of functions that has not been
inlined).

gcc/ChangeLog:

* auto-profile.cc (afdo_indirect_call): Compute speculative edge
probability.
(add_scale): Break out from ...
(scale_bbs): Break out from ...
(afdo_adjust_guessed_profile): ... here; use componet array instead of
current_component hash_map; handle components with only 0 profile;
be more agressive on finding scales along the boundary.

Fix cgraph_node::apply_scale

while working on auto-FDO I noticed that we may run into ICE because we inline
function with count profile_count::zero to a call site with profile_count::zero.
What may go wrong is that the caller has local profile while callee may have
IPA profiles.

We used to turn all such counts to 0, but that has changed by a short circuit
I introducd recently. Fixed thus.

* cgraph.cc (cgraph_node::apply_scale): Special case scaling
to profile_count::zero ().
(cgraph_node::verify_node): Add extra compatibility check.

Add testcase for AFDO early inlining and indirect call promotion

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: New test.

c++,coroutines: Handle await expressions in assume attributes.

Here we have an expression that is not evaluated but is still seen
as potentially-evaluated. We handle this by determining if the
operand has side-effects, producing a warning that the assume has
been ignored and eliding it.

gcc/cp/ChangeLog:

* coroutines.cc (analyze_expression_awaits): Elide assume
attributes containing await expressions, since these have
side effects. Emit a diagnostic that this has been done.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/assume.C: New test.

[PATCH v1] RISC-V: Use scratch reg for loop control

By using the scratch register for loop control rather than the output
of the lr instruction we can avoid an unnecessary "mv" instruction.

--
V2: Testcase update with no regressions found for the following the changes.

gcc/ChangeLog:

* config/riscv/sync.md (lrsc_atomic_exchange<mode>): Use scratch
register for loop control rather than lr output.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zalrsc.c: New test.

c++: correct __is_trivially_destructible nargs [PR120678]

I missed adjusting the number of args when copying the
IS_TRIVIALLY_CONSTRUCTIBLE line to create IS_TRIVIALLY_DESTRUCTIBLE.

PR c++/120678

gcc/cp/ChangeLog:

* cp-trait.def (IS_TRIVIALLY_DESTRUCTIBLE): Fix nargs.

c++: modules and #pragma diagnostic

To respect the #pragma diagnostic lines in libstdc++ headers when compiling
with module std, we need to represent them in the module.

I think it's reasonable to give serializers direct access to the underlying
data, as here with get_classification_history. This is a different approach
from how Jakub made PCH streaming members of diagnostic_option_classifier,
but it seems to me that modules handling belongs in module.cc.

libcpp/ChangeLog:

* line-map.cc (linemap_location_from_module_p): Add.
* include/line-map.h: Declare it.

gcc/ChangeLog:

* diagnostic.h (diagnostic_option_classifier): Friend
diagnostic_context.
(diagnostic_context::get_classification_history): New.

gcc/cp/ChangeLog:

* module.cc (module_state::write_diagnostic_classification): New.
(module_state::write_begin): Call it.
(module_state::read_diagnostic_classification): New.
(module_state::read_initial): Call it.
(dk_string, dump_dc_change): New.

gcc/testsuite/ChangeLog:

* g++.dg/modules/warn-spec-3_a.C: New test.
* g++.dg/modules/warn-spec-3_b.C: New test.
* g++.dg/modules/warn-spec-3_c.C: New test.

crc: Fix up ICE from optimize_crc_loop [PR120677]

The following testcase ICEs, because optimize_crc_loop inserts a call
statement before labels instead of after labels.

Fixed thusly (plus fixed other issues noticed around it).

2025-06-17  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/120677
* gimple-crc-optimization.cc (crc_optimization::optimize_crc_loop):
Insert before gsi_after_labels instead of gsi_start_bb.  Use
gimple_bb (output_crc) instead of output_crc->bb.  Formatting fix.

* gcc.c-torture/execute/pr120677.c: New test.

aarch64: Add vec_set/extract for tuple modes [PR113027]

We generated inefficient code for bitfield references to Advanced
SIMD structure modes.  In RTL, these modes are just extra-long
vectors, and so inserting and extracting an element is simply
a vec_set or vec_extract operation.

For the record, I don't think these modes should ever become fully
fledged vector modes.  We shouldn't provide add, etc. for them.
But vec_set and vec_extract are the vector equivalent of insv
and extv.  From that point of view, they seem closer to moves
than to arithmetic.

gcc/
PR target/113027
* config/aarch64/aarch64-protos.h (aarch64_decompose_vec_struct_index):
Declare.
* config/aarch64/aarch64.cc (aarch64_decompose_vec_struct_index): New
function.
* config/aarch64/iterators.md (VEL, Vel): Add Advanced SIMD
structure modes.
* config/aarch64/aarch64-simd.md (vec_set<VSTRUCT_QD:mode>)
(vec_extract<VSTRUCT_QD:mode>): New patterns.

gcc/testsuite/
PR target/113027
* gcc.target/aarch64/pr113027-1.c: New test.
* gcc.target/aarch64/pr113027-2.c: Likewise.
* gcc.target/aarch64/pr113027-3.c: Likewise.
* gcc.target/aarch64/pr113027-4.c: Likewise.
* gcc.target/aarch64/pr113027-5.c: Likewise.
* gcc.target/aarch64/pr113027-6.c: Likewise.
* gcc.target/aarch64/pr113027-7.c: Likewise.

OpenMP: Fix implicit 'declare target' for <ostream>

libstdc++-v3/include/std/ostream contains:

  namespace std _GLIBCXX_VISIBILITY(default)
  {
    ...
    template<typename _CharT, typename _Traits>
      inline basic_ostream<_CharT, _Traits>&
      endl(basic_ostream<_CharT, _Traits>& __os)
      { return flush(__os.put(__os.widen('\n'))); }
  ...
  #include <bits/ostream.tcc>

and the latter, libstdc++-v3/include/bits/ostream.tcc, has:
    // Inhibit implicit instantiations for required instantiations,
    // which are defined via explicit instantiations elsewhere.
  #if _GLIBCXX_EXTERN_TEMPLATE
    extern template class basic_ostream<char>;
    extern template ostream& endl(ostream&);

Before this commit, omp_discover_declare_target_tgt_fn_r marked 'endl'
as (implicitly) declare target - but not the calls in it due to the
'extern' (DECL_EXTERNAL).

Thanks to inlining and as 'endl' is (therefore) not used and, hence,
discarded by the linker; hencet, it works with -O0 and -O1. However,
as the (unused) function still exits, IPA CP (enabled by -O2) will try
to do constant-value propagation and fails as the definition of 'widen'
is not available.

Solution is to still walk 'endl' despite being an 'extern(al)' decl;
this has been restricted for now to DECL_DECLARED_INLINE_P.

gcc/ChangeLog:

* omp-offload.cc (omp_discover_declare_target_tgt_fn_r): Also
walk external functions that are declare inline (and have a
DECL_SAVED_TREE).

libgomp/ChangeLog:

* testsuite/libgomp.c++/declare_target-2.C: New test.

c++, coroutines: Handle unevaluated contexts.

From [expr.await]/2
We should not accept co_await, co_yield in unevaluated contexts.

Currently (see PR68604) we do not mark typeid expressions as unevaluated
since the standard rules mean that this depends on the value type.

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_await_expr): Do not allow in an
unevaluated context.
(finish_co_yield_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/unevaluated.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines: Avoid UNKNOWN_LOCATION synthesizing code [PR120273].

Some of the lookup code is expecting to find a valid (not UNKNOWN)
location, which triggers in the reported case. To avoid this, we are
reverting the change to use UNKNOWN_LOCATION for synthesizing the
wrapper, and instead using the start and end locations of the original
function.

PR c++/120273

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Use
function start and end locations when synthesizing code.
(cp_coroutine_transform::cp_coroutine_transform): Set the
function end location.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr120273.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

RISC-V: Add -fno-pie flags to testcases

PIE may cause some code gen difference in the testcases, that will cause
problem when we configure toolchain with `--enable-default-pie`.

So adding -fno-pie flags to the testcases to avoid this issue.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/jump-table-large-code-model.c: Adding
-fno-pie.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Ditto.

Daily bump.

cobol: Some 1000 small changes in answer to cppcheck diagnostics.

constification per cppcheck. Use STRICT_WARN and fix reported
diagnostics. Ignore [shadowVariable] in general. Use std::vector to
avoid exposing arrays as raw pointers.

PR cobol/120621

gcc/cobol/ChangeLog:

* Make-lang.in: Use STRICT_WARN.
* cbldiag.h (location_dump): suppress shadowVariable.
* cdf-copy.cc (esc): Fix shadowVariable.
(copybook_elem_t::open_file): Do not use %m.
* cdf.y: suppress invalidPrintfArgType for target format.
* cdfval.h (struct cdfval_t): Suppress noExplicitConstructor.
* cobol1.cc (cobol_name_mangler): Use C++ cast.
* copybook.h (class copybook_elem_t): Same.
* dts.h: Fixes and suppressions due to cppcheck.
* except.cc (cbl_enabled_exceptions_t::status): Suppress useStlAlgorithm.
(cbl_enabled_exceptions_t::turn_on_off): Const parameter.
(class choose_declarative): Removed.
* genapi.cc (struct called_tree_t): Explicit constructor.
(parser_compile_ecs): Cast to void * for %p.
(parser_compile_dcls): Same.
(parser_statement_begin): Same.
(initialize_variable_internal): Use std::vector for subscripts.
(parser_initialize): Constification.
(get_string_from): Same.
(combined_name): Same.
(parser_perform): Same.
(psa_FldLiteralN): Same.
(is_figconst): Const parameter.
(is_figconst_t): Same.
(parser_exit): Same.
(parser_division): Const pointer.
(parser_perform_conditional): Whitespace.
(parser_set_conditional88): Const parameter.
(inspect_tally): Use std::vector.
(inspect_replacing): Same.
(parser_inspect): Same.
(parser_intrinsic_subst): Use std::vector (constuct elements).
(parser_intrinsic_call_1): Use std::vector for subscripts.
(is_ascending_key): Const pointer.
(parser_sort): Use std::vector.
(parser_file_sort): Same.
(parser_file_merge): Same.
(parser_unstring): Same.
(parser_string): Same.
(parser_call): Const pointer.
(parser_program_hierarchy): Use std::vector.
(conditional_abs): Const paraemeter.
(float_type_of): Same.
(initial_from_initial): Set value, quoted or not.
(parser_symbol_add): Remove redundant nested test.
* genapi.h (parser_add): Const parameters.
(parser_subtract): Same.
(parser_multiply): Same.
(parser_divide): Same.
(parser_perform): Same.
(parser_exit): Same.
(parser_initialize): Same.
(parser_set_conditional88): Same.
(parser_sort): Same.
(parser_file_sort): Same.
(parser_file_merge): Same.
(parser_string): Same.
(is_ascending_key): Same.
* genmath.cc (arithmetic_operation): Use std::vector.
(is_somebody_float): Const parameter.
(all_results_binary): Const parameter.
(fast_multiply): Remove redundant nested test.
(parser_add): Const parameter.
(parser_multiply): Remove redundant nested test.
(parser_divide): Const parameter.
(parser_subtract): Same.
* genutil.cc (get_depending_on_value): Use std::vector.
(get_data_offset): Same.
(tree_type_from_field): Const parameter.
(refer_has_depends): Const pointers.
(get_literal_string): RAII.
(refer_is_clean): Use std::vector.
(get_time_nanoseconds): Newline at EOF.
* genutil.h (tree_type_from_field): Remove declaration.
* inspect.h (struct cbx_inspect_qual_t): Use std::vector.
(struct cbl_inspect_qual_t): Same.
(struct cbx_inspect_match_t): Same.
(class cbl_inspect_match_t): Same.
(struct cbx_inspect_replace_t): Same.
(struct cbl_inspect_replace_t): Same.
(struct cbx_inspect_oper_t): Same.
(struct cbl_inspect_oper_t): Same.
(struct cbx_inspect_t): Same.
(struct cbl_inspect_t): Same.
(parser_inspect): Same.
* lexio.cc (indicated): Const pointer.
(remove_inline_comment): Scope reduction.
(maybe_add_space): Const pointer.
(recognize_replacements): C++ cast.
(check_source_format_directive): Same.
(struct replacing_term_t): Explicit constructor.
(parse_replace_pairs): Const reference.
(location_in): Const reference.
(parse_copy_directive): C++ cast.
(parse_replace_last_off): Const parameter.
(parse_replace_text): Const reference.
(parse_replace_directive): C++ cast.
(cdftext::lex_open): Const reference.
(cdftext::open_output): Scope reduction.
(cdftext::free_form_reference_format): Remove unused variable.
(cdftext::process_file): Simplify.
* lexio.h (struct bytespan_t): Use nullptr.
(struct filespan_t): Initialize icol in constructor.
(struct span_t): Suppress confused operatorEqRetRefThis.
(struct replace_t): Eliminate single-value constructor.
* parse.y: Many const cppcheck reports, and portable bit-shift.
* parse_ante.h (reject_refmod): Const parameter.
(require_pointer): Same.
(require_integer): Same.
(struct evaluate_elem_t): Explicit constructor.
(struct arith_t): Use std::vector.
(class eval_subject_t): Const parameter.
(dump_inspect_match): Declare.
(struct perform_t): Explicit constructor.
(list_add): Const parameter.
(class tokenset_t): Avoid negative array index.
(struct file_list_t): Explicit constructor.
(struct field_list_t): Same.
(struct refer_list_t): Same.
(struct refer_marked_list_t): Const parameter.
(struct refer_collection_t): Explicit constructor.
(struct ast_inspect_oper_t): Remove class.
(ast_inspect_oper_t): Same.
(struct ast_inspect_t): Same.
(struct ast_inspect_list_t): Same.
(ast_inspect): Add location.
(struct elem_list_t): Explicit constructor.
(struct unstring_tgt_t): Same.
(struct unstring_tgt_list_t): Same.
(struct unstring_into_t): Same.
(struct ffi_args_t): Same.
(struct file_sort_io_t): Same.
(merge_t): Same.
(struct vargs_t): Same.
(class prog_descr_t): Eliminate single-value constructor.
(class program_stack_t): Suppress useStlAlgorithm.
(struct rel_part_t): Eliminate single-value constructor.
(class log_expr_t): Explicit constructor.
(add_debugging_declarative): Rename local variable.
(intrinsic_call_2): Const parameter.
(invalid_key): Use std::find_if.
(parser_add2): Const parameter.
(parser_subtract2): Same.
(stringify): Same.
(unstringify): Same.
(anybody_redefines): Same.
(ast_call): Same.
* parse_util.h (class cname_cmp): Explicit constructor.
(intrinsic_inconsistent_parameter): Same.
* scan_ante.h (struct cdf_status_t): Eliminate single-value constructor.
(class enter_leave_t): Explicit constructor.
(update_location): Const pointer, explicit constructor.
(symbol_function_token): Const pointer.
(typed_name): Same.
* scan_post.h (datetime_format_of): Scope reduction.
* show_parse.h (class ANALYZE): Use std::vector, explicit consstructor.
* symbols.cc (symbol_table_extend): Scope reduction.
(cbl_ffi_arg_t::cbl_ffi_arg_t): Define default constructor.
(end_of_group): Const pointer.
(symbol_find_odo): Const parameter.
(rename_not_ok): Same.
(field_str): Use %u instead of %d.
(struct capacity_of): Const pointer.
(symbols_update): Same.
(symbol_field_parent_set): Same.
(symbol_file_add): Same.
(symbol_typedef_add): Same.
(symbol_field_add): Use new operator=().
(symbol_field): Suppress CastIntegerToAddressAtReturn.
(symbol_register): Same.
(symbol_file): Suppress knownConditionTrueFalse.
(next_program): Const parameter.
(symbol_file_record): Same.
(class is_section): Explicit constructor.
(cbl_file_t::no_key): Remove.
(cbl_prog_hier_t::cbl_prog_hier_t): Use std::vector.
(symbol_label_add): Assert pointer is not NULL.
(symbol_label_section_exists): Const reference in lambda.
(expand_picture): Use C++ cast.
(symbol_program_callables): Const pointer.
(symbol_currency_add): Suppress nullPointerRedundantCheck.
(cbl_key_t): Use std::vector.
(cbl_occurs_t::field_add): Const parameter.
(cbl_occurs_t::index_add): Explicit constructor.
(class is_field_at): Same.
(cbl_file_key_t::deforward): Scope reduction.
(cbl_file_t::keys_str): Use allocated memory only.
(file_status_status_of): Const pointer.
(is_register_field): Const parameter.
* symbols.h (struct cbl_field_data_t): Eliminate single-value constructor.
(struct cbl_occurs_bounds_t): Same.
(struct cbl_refer_t): Use std::vector.
(valid_move): Const parameter.
(is_register_field): Same.
(struct cbl_key_t): Use std::vector.
(struct cbl_substitute_t): Eliminate single-value constructor.
(refer_of): Return const reference
(struct cbl_ffi_arg_t): Eliminate single-value constructor.
(class temporaries_t): Same.
(struct cbl_file_key_t): Define default constructor.
(struct cbl_file_lock_t): Define copy constructor and operator=().
(struct cbl_file_t): Complete default constructor.
(struct symbol_elem_t): Explicit constructor.
(symbol_elem_of): Suppress cstyleCast.
(symbol_redefines): Const parameter.
(struct cbl_field_t): Same.
(cbl_section_of): Test for NULL pointer.
(cbl_field_of): Same.
(cbl_label_of): Same.
(cbl_special_name_of): Same.
(cbl_alphabet_of): Same.
(cbl_file_of): Same.
(is_figconst): Delete extra "struct" keyword.
(is_figconst_low): Same.
(is_figconst_zero): Same.
(is_figconst_space): Same.
(is_figconst_quote): Same.
(is_figconst_high): Same.
(is_space_value): Same.
(is_quoted): Same.
(symbol_index): Const parameter.
(struct cbl_prog_hier_t): Suppress noExplicitConstructor.
(struct cbl_perform_vary_t): Eliminate single-value constructor.
(is_signable): Const parameter.
(is_temporary): Same.
(rename_not_ok): Same.
(field_at): Test for NULL pointer.
(class procref_base_t): Eliminate single-value constructor.
* symfind.cc (is_data_field): Const pointer.
(finalize_symbol_map2): Same.
(class in_scope): Same.
(symbol_match2): Same.
* token_names.h: Suppress useInitializationList.
* util.cc (normalize_picture): Whitespace and remove extra "continue".
(redefine_field): Const pointer.
(cbl_field_t::report_invalid_initial_value): Same.
(literal_subscript_oob): Rename shadow variable.
(cbl_refer_t::subscripts_set): Use std::vector.
(cbl_refer_t::str): Same.
(cbl_refer_t::deref_str): Same.
(locally_unique): Use explicit constructor.
(ambiguous_reference): Same.
(class unique_stack): Use const reference.
(cobol_filename): Const pointer.
(verify_format): Scope reduction.
(class temp_loc_t): Do not derive from YYLTYPE.
(cobol_parse_files): Const pointer.
* util.h (as_voidp): Define convenient converter.

libgcobol/ChangeLog:

* common-defs.h (class cbl_enabled_exceptions_t): Const parameter.

aarch64: Add support for unpacked SVE FP conversions

This patch introduces expanders for FP<-FP conversions that levarage
partial vector modes.  We also extend the INT<-FP and FP<-INT conversions
using the same approach.

The ACLE enables vectorized conversions like the following:

fcvt z0.h, p7/m, z1.s

modelling the source vector as VNx4SF:

... |     SF|     SF|     SF|     SF|

and the destination as a VNx8HF, where this operation would yield:

... | 0 | HF| 0 | HF| 0 | HF| 0 | HF|

hence the useful results are stored unpacked, i.e.

... | X | HF| X | HF| X | HF| X | HF| (VNx4HF)

This patch allows the vectorizer to use this variant of fcvt as a
conversion from VNx4SF to VNx4HF.  The same idea applies to widening
conversions, and between vectors with FP and integer base types.

If the source itself had been unpacked, e.g.

... |   X   |     SF|   X   |     SF| (VNx2SF)

The result would yield

... | X | X | X | HF| X | X | X | HF| (VNx2HF)

The upper bits of each container here are undefined, it's important to
avoid interpreting them during FP operations - doing so could introduce
spurious traps.  The obvious route we've taken here is to mask undefined
lanes using the operation's predicate if we have flag_trapping_math.

The VPRED predicate mode (e.g. VNx2BI here) cannot do this; to ensure
correct behavior, we need a predicate mode that can control the data as if
it were fully-packed (VNx4BI).

Both VNx2BI and VNx4BI must be recognised as legal governing predicate modes
by the corresponding FP insns.  In general, the governing predicate mode for
an insn could be any such with at least as many significant lanes as the data
mode.  For example, addvnx4hf3 could be controlled by any of VNx{4,8,16}BI.

We implement 'aarch64_predicate_operand', a new define_special_predicate, to
acheive this.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_sve_valid_pred_p):
Declare helper for aarch64_predicate_operand.
(aarch64_sve_packed_pred): Declare helper for new expanders.
(aarch64_sve_fp_pred): Likewise.
* config/aarch64/aarch64-sve.md (<optab><mode><v_int_equiv>2):
Extend into...
(<optab><SVE_HSF:mode><SVE_HSDI:mode>2): New expander for converting
vectors of HF,SF to vectors of HI,SI,DI.
(<optab><VNx2DF_ONLY:mode><SVE_2SDI:mode>2): New expander for converting
vectors of SI,DI to vectors of DF.
(*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>):
New pattern to match those we've added here.
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>): Extend
into...
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><SVE_SI:mode>): Match both
VNx2SI<-VNx2DF and VNx4SI<-VNx4DF.
(<optab><v_int_equiv><mode>2): Extend into...
(<optab><SVE_HSDI:mode><SVE_F:mode>2): New expander for converting vectors
of HI,SI,DI to vectors of HF,SF,DF.
(*aarch64_sve_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>): New
pattern to match those we've added here.
(trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>2): New expander to handle
narrowing ('truncating') FP<-FP conversions.
(*aarch64_sve_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>): New
pattern to handle those we've added here.
(extend<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>2): New expander to handle
widening ('extending') FP<-FP conversions.
(*aarch64_sve_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>): New
pattern to handle those we've added here.
* config/aarch64/aarch64.cc (aarch64_sve_packed_pred): New function.
(aarch64_sve_fp_pred): Likewise.
(aarch64_sve_valid_pred_p): Likewise.
* config/aarch64/iterators.md (SVE_PARTIAL_HSF): New mode iterator.
(SVE_HSF): Likewise.
(SVE_SDF): Likewise.
(SVE_SI): Likewise.
(SVE_2SDI) Likewise.
(self_mask):  Extend to all integer/FP vector modes.
(narrower_mask): Likewise (excluding QI).
* config/aarch64/predicates.md (aarch64_predicate_operand): New special
predicate to handle narrower predicate modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Disable the aarch64 vector
cost model to preserve this test.
* gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/pack_float_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cvtf_1.c: New test.
* gcc.target/aarch64/sve/unpacked_cvtf_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cvtf_3.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvt_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvt_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvtz_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fcvtz_2.c: Likewise.

aarch64: Extend iterator support for partial SVE FP modes

Define new iterators for partial floating-point modes, and cover these
in some existing mode_attrs. This patch serves as a starting point for
an effort to extend support for unpacked floating-point operations.

To differentiate between BFloat mode iterators that need to test
TARGET_SSVE_B16B16, and those that don't (see LOGICALF), this patch
enforces the following naming convention:
- _BF: BF16 modes will not test TARGET_SSVE_B16B16.
- _B16B16: BF16 modes will test TARGET_SSVE_B16B16.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md: Replace uses of SVE_FULL_F_BF
with SVE_FULL_F_B16B16.
Replace use of SVE_F with SVE_F_BF.
* config/aarch64/iterators.md (SVE_PARTIAL_F): New iterator for
partial SVE FP modes.
(SVE_FULL_F_BF): Rename to SVE_FULL_F_B16B16.
(SVE_PARTIAL_F_B16B16): New iterator (BF16 included) for partial
SVE FP modes.
(SVE_F_B16B16): New iterator for all SVE FP modes.
(SVE_BF): New iterator for all SVE BF16 modes.
(SVE_F): Redefine to exclude BF16 modes.
(SVE_F_BF): New iterator to replace the previous SVE_F.
(VPRED): Describe the VPRED mapping for partial vector modes.
(b): Cover partial FP modes.
(is_bf16): Likewise.

Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961]

In ALLOCATE statements where the MOLD= argument is present and is not
scalar, and the allocate-object has an explicit-shape-spec, the standard
does not require the ranks to agree. In that case we skip the rank check,
but emit a warning if -Wsurprising is given.

PR fortran/51961

gcc/fortran/ChangeLog:

* resolve.cc (conformable_arrays): Use modified rank check when
MOLD= expression is given.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocate_with_mold_5.f90: New test.

c++: add -Wsfinae-incomplete

We already error about a type or function definition causing a concept check
to change value, but it would be useful to diagnose this for other SFINAE
contexts as well; the memoization problem also affects templates.  So
-Wsfinae-incomplete remembers if we've failed a requirement for a complete
type/deduced return type in a non-tf_error context, and later warns if the
type/function becomes complete.

This warning is enabled by default; I think the signal-to-noise ratio is
high enough to warrant that, and it catches things that are likely to make
the program "ill-formed, no diagnostic required".

friend87.C is an interesting case; this could be considered a false positive
because it is using friend injection to define the auto function to
implement a compile-time counter.  I think this is sufficiently pathological
that it's fine to expect people who want to play this sort of game to
suppress the warning.

The data for this warning uses GTY((cache)) to persist through GC, but allow
entries to be discarded if the key is not otherwise marked.

I don't think it's desirable to export/import this information in modules,
it makes sense for it to be local to a single TU.

-Wsfinae-incomplete=2 adds a warning at the point of failure, which is
primarily intended to help with debugging warnings from the default mode.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wsfinae-incomplete.

gcc/c-family/ChangeLog:

* c.opt: Add -Wsfinae-incomplete.
* c.opt.urls: Regenerate.

gcc/cp/ChangeLog:

* constraint.cc (failed_completions_map): New.
(note_failed_type_completion): Rename from
note_failed_type_completion_for_satisfaction.  Add
-Wsfinae-incomplete handling.
(failed_completion_location): New.
* class.cc (finish_struct_1): Add -Wsfinae-incomplete warning.
* decl.cc (require_deduced_type): Adjust.
(finish_function): Add -Wsfinae-incomplete warning.
* typeck.cc (complete_type_or_maybe_complain): Adjust.
(cxx_sizeof_or_alignof_type): Call note_failed_type_completion.
* pt.cc (dependent_template_arg_p): No longer static.
* cp-tree.h: Adjust.

libstdc++-v3/ChangeLog:

* testsuite/20_util/is_complete_or_unbounded/memoization.cc
* testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc:
Expect -Wsfinae-incomplete.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend87.C
* g++.dg/cpp2a/concepts-complete1.C
* g++.dg/cpp2a/concepts-complete2.C
* g++.dg/cpp2a/concepts-complete3.C
* g++.dg/cpp2a/concepts-complete4.C: Expect -Wsfinae-incomplete.

RISC-V: Refine VX combine test case 0 to avoid code duplication

The case 0 for vx combine def functions are most the same across
the different test files. Thus, re-arrange them in one place to
avoid code duplication.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Leverage
helper macros to avoid code duplication.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add
signed and unsigned vx combine test macros.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: add support for AEABI Build Attributes

GCS (Guarded Control Stack, an Armv9.4-a extension) requires some
caution at runtime. The runtime linker needs to reason about the
compatibility of a set of relocable object files that might not
have been compiled with the same compiler.
Up until now, those metadata, used for the previously mentioned
runtime checks, have been provided to the runtime linker via GNU
properties which are stored in the ELF section ".note.gnu.property".
However, GNU properties are limited in their expressibility, and a
long-term commmitment was taken in the ABI for the Arm architecture
[1] to provide Build Attributes (a.k.a. BAs).

This patch adds the support for emitting AArch64 Build Attributes.
This support includes generating two new assembler directives:
.aeabi_subsection and .aeabi_attribute. These directives are generated
as per the syntax mentioned in spec "Build Attributes for the Arm®
64-bit Architecture (AArch64)" available at [1].

gcc/configure.ac now includes a new check to test whether the
assembler being used to build the toolchain supports these new
directives.
Two behaviors can be observed when -mbranch-protection=[standard|...]
is passed:
- If the assembler support BAs, GCC emits the BAs directives and
no GNU properties. Note: the static linker will derive the values
of GNU properties from the BAs, and will emit both BAs and GNU
properties into the output object.
- If the assembler do not support them, only .note.gnu.property
section will contain the relevant information.

Bootstrapped on aarch64-none-linux-gnu, and no regression found.

[1]: https://github.com/ARM-software/abi-aa/pull/230

gcc/ChangeLog:

* config.in: Regenerate.
* config/aarch64/aarch64-elf-metadata.h
(class aeabi_subsection): New class for BAs.
* config/aarch64/aarch64-protos.h
(aarch64_pacret_enabled): New function.
* config/aarch64/aarch64.cc
(HAVE_AS_AEABI_BUILD_ATTRIBUTES): New definition.
(aarch64_file_end_indicate_exec_stack): Emit BAss.
(aarch64_pacret_enabled): New function.
(aarch64_start_file): Indent.
* configure: Regenerate.
* configure.ac: New configure check for BAs support in binutils.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp:
(check_effective_target_aarch64_gas_has_build_attributes): New checker.
* gcc.target/aarch64/build-attributes/aarch64-build-attributes.exp: New DejaGNU file.
* gcc.target/aarch64/build-attributes/build-attribute-bti.c: New test.
* gcc.target/aarch64/build-attributes/build-attribute-gcs.c: New test.
* gcc.target/aarch64/build-attributes/build-attribute-pac.c: New test.
* gcc.target/aarch64/build-attributes/build-attribute-standard.c: New test.
* gcc.target/aarch64/build-attributes/no-build-attribute-bti.c: New test.
* gcc.target/aarch64/build-attributes/no-build-attribute-gcs.c: New test.
* gcc.target/aarch64/build-attributes/no-build-attribute-pac.c: New test.
* gcc.target/aarch64/build-attributes/no-build-attribute-standard.c: New test.

Co-Authored-By: Srinath Parvathaneni <srinath.parvathaneni@arm.com>

aarch64: encapsulate note.gnu.property emission into a class

The code emitting the GNU properties was moved to a separate file to
improve modularity and "releave" the 31000-lines long aarch64.cc file
from a few lines.

It introduces a new namespace "aarch64::" for AArch64 backend which
reduce the length of function names by not prepending 'aarch64_' to
each of them.

gcc/ChangeLog:

* Makefile.in: Add missing declaration of BACKEND_H.
* config.gcc: Add aarch64-elf-metadata.o to extra_objs.
* config/aarch64/aarch64-elf-metadata.h: New file
* config/aarch64/aarch64-elf-metadata.cc: New file.
* config/aarch64/aarch64.cc
(GNU_PROPERTY_AARCH64_FEATURE_1_AND): Removed.
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI): Likewise.
(GNU_PROPERTY_AARCH64_FEATURE_1_PAC): Likewise.
(GNU_PROPERTY_AARCH64_FEATURE_1_GCS): Likewise.
(aarch64_file_end_indicate_exec_stack): Move GNU properties code to
aarch64-elf-metadata.cc
* config/aarch64/t-aarch64: Declare target aarch64-elf-metadata.o

c++: ICE with unexpanded pack in asm

Here an unexpanded parameter pack pass into asm_operand which doesn't
expect to see an operand without type. So use check_for_bare_parameter_packs
to remedy that.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_operand_list): Check for unexpanded
parameter packs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-crash7.C: New test.

aarch64: add debug comments to feature properties in .note.gnu.property

GNU properties are emitted to provide some information about the features
used in the generated code like BTI, GCS, or PAC. However, no debug
comment are emitted in the generated assembly even if -dA is provided.
It makes understanding the information stored in the .note.gnu.property
section more difficult than needed.

This patch adds assembly comments (if -dA is provided) next to the GNU
properties. For instance, if BTI and PAC are enabled, it will emit:
.word 0x3 // GNU_PROPERTY_AARCH64_FEATURE_1_AND (BTI, PAC)

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_file_end_indicate_exec_stack): Emit assembly comments.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bti-1.c: Emit assembly comments, and update
test assertion.

Combine static and afdo branch predictions

Currently afdo reads the profile and anotates basic blocks containing
statements which have samples in profile data.  For basic blocks which has been
fully optimized out (for example, basic blocks controlling loops that has been
fully unrolled) it has no data which it then tries to determine in
afdo_propagate using Kirhoff law.

Problem is that often there is not enough info to solve the problem. In that
case few tricks are applied and then algorithm gives up.

In all cases where it gave up, the count is then set to AFDO 0 and consequently
we end up with basic blocks having 0 counts in hot regions of program and we
can not trust those 0s much when optimizing.

This patch attempts to preserve static profile in regions we have no info.
After the propagation connected regions are identified and existing profile is
scaled to fit profile data.  For single-entry-single exit regions this is
correct answer.  For other regions we can theoretically try to adjust static
profile, but no attempts is made to do it to keep things simple.
Static profile has quality GUESSED while AFDO data quality AFDO which makes
it possible to distinguish it later.

afdo_adjust_guessed_profile does the profile adjustment. Rest of changes are
preventing the code from tampering with counts of basic blocks that can not
be fully determined.  The propagation has some tricks to compute lower bound
of some basic blocks on the boundary of annotated regions and i am not trying
to preserve that.

We can end up with connected components where we can not determine the count.
This happens in pracitce in hot code i.e. for SPEC2017 perl benchmark. I plan
to handle this incrementally. Current code will simply set profle as undefined
in those regions, which works worse than 0 and thus we get regression in perl.
With this changed to 0, I now get same SPEC2017 score as without profiling.

The patch also makes gcc to completely ignore info about basic blocks which
do have statements that have actual 0 AFDO profile info.  Since the profile
generation tool cuts profile at 2%, I think we should keep low guessed profile
there insead of 0.  This is another step I plan to work on incrementally
this week.

Bootstrapped/regtested x86-64, comitted.

gcc/ChangeLog:

* auto-profile.cc (edge_set): Remove unused typedef.
(is_bb_annotated): Sanity check that annotated BBs has
quality AFDO and non-anntoated non-AFDO.  Exceptions are
zeros.
(set_bb_annotated): Verify that BB set annotated has
AFDO profile.
(afdo_set_bb_count): Do not return true for 0 counts.
(afdo_find_equiv_class): Fix formating;
do not combine profile of annoated and non-annotated BBs.
(afdo_propagate_edge): Fix variable names; dump info
about changes; do not change non-annoated BB profiles;
if all flow out of BB was decided on, annotate remaining
edges with 0.
(afdo_propagate): Dump info about copied BB counts
and number of iteraitons used.
(cmp): New function.
(afdo_adjust_guessed_profile): New function.
(afdo_calculate_branch_prob): Do not initialize loop
optimizer here; call afdo_adjust_guessed_profile.
(afdo_annotate_cfg): Initialize profile here;
anotate entry/exit blocks only of profile is non-0.
* profile-count.h: (profile_count::force_guessed): New.
* tree-cfg.cc (gimple_verify_flow_info): Fix typo.

RISC-V: Update Profiles string in RV23.

Add b-ext in RVA/B23 as independent extension flags and add supm in
RVA23.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add b-ext and supm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-53.c: Update testcase.

xtensa: Revert "xtensa: Eliminate unwanted reg-reg moves during DFmode input reloads"

Since there are no unwanted reg-reg moves during DFmode input reloads in
recent GCCs, the previously committed patch
"xtensa: eliminate unwanted reg-reg moves during DFmode input reloads"
(commit cfad4856fa46abc878934a9433d0bfc2482ccf00) is no longer necessary
and is therefore being reverted.

gcc/ChangeLog:

* config/xtensa/predicates.md (reload_operand):
Remove.
* config/xtensa/xtensa.md:
Remove the peephole2 pattern that was previously added.

xtensa: Revert "xtensa: Eliminate unnecessary general-purpose reg-reg moves"

Due to improved register allocation for GP registers whose modes has been
changed by paradoxical SUBREGs, the previously committed patch
"xtensa: eliminate unnecessary general-purpose reg-reg moves"
(commit f83e76c3f998c8708fe2ddca16ae3f317c39c37a) is no longer necessary
and is therefore reverted.

gcc/ChangeLog:

* config/xtensa/xtensa.md:
Remove the peephole2 pattern that was previously added.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/elim_GP_regmove_0.c: Remove.
* gcc.target/xtensa/elim_GP_regmove_1.c: Remove.

simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1) A) ASHIFT(1 A)) to IOR.

This patch adds a new simplification rule to `simplify-rtx.cc` that
handles a common bit manipulation pattern involving a single-bit set
and clear followed by XOR.

The transformation targets RTL of the form:

  (xor (and (rotate (~1) A) B) (ashift 1 A))

which is semantically equivalent to:

  B | (1 << A)

- v3 log:
  Update RTL format, remove commas.
  Only apply on SHIFT_COUNT_TRUNCATED target.
  check '!side_effects_p' on XEXP (op1, 1).

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Handle
more logical simplifications.

Daily bump.

RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx,
with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vmaxu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx,
with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vmaxu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmaxu.vv to the
vmaxu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)                                        \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = in[i] OP x;                                            \
  }

  DEF_VX_BINARY(int32_t, /)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vmaxu.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vmaxu.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case UMAX.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op umax.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

AVR: Fix PR120423 / PR116389.

The problem with PR120423 and PR116389 is that reload might assign an invalid
hard register to a paradoxical subreg.  For example with the test case from
the PR, it assigns (REG:QI 31) to the inner of (subreg:HI (QI) 0) which is
valid, but the subreg will be turned into (REG:HI 31) which is invalid
and triggers an ICE in postreload.

The problem only occurs with the old reload pass.

The patch maps the paradoxical subregs to a zero-extends which will be
allocated correctly.  For the 120423 testcases, the code is the same like
with -mlra (which doesn't implement the fix), so the patch doesn't even
introduce a performance penalty.

The patch is only needed for v15:  v14 is not affected, and in v16 reload
will be removed.

PR rtl-optimization/120423
PR rtl-optimization/116389
gcc/
* config/avr/avr.md [-mno-lra]: Add pre-reload split to transform
(left shift of) a paradoxical subreg to a (left shift of) zero-extend.
gcc/testsuite/
* gcc.target/avr/torture/pr120423-1.c: New test.
* gcc.target/avr/torture/pr120423-2.c: New test.
* gcc.target/avr/torture/pr120423-116389.c: New test.

(cherry picked from commit 61789b5abec3079d02ee9eaa7468015ab1f6f701)

c++, coroutines: Improve diagnostics for awaiter/promise.

At present, we can issue diagnostics about missing or malformed
awaiter or promise methods when we encounter their uses in the
body of a user's function. We might then re-issue the same
diagnostics when processing the initial or final await expressions.

This change avoids such duplication, and also attempts to
identify issues with the initial or final expressions specifically
since diagnostics for those do not have any useful line number.

gcc/cp/ChangeLog:

* coroutines.cc (build_co_await): Identify diagnostics
for initial and final await expressions.
(cp_coroutine_transform::wrap_original_function_body): Do
not handle initial and final await expressions here ...
(cp_coroutine_transform::apply_transforms): ... handle them
here and avoid duplicate diagnostics.
* coroutines.h: Declare inital and final await expressions
in the transform class. Save the function closing brace
location.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro1-missing-await-method.C: Adjust for
improved diagnostics.
* g++.dg/coroutines/coro-missing-final-suspend.C: Likewise.
* g++.dg/coroutines/pr104051.C: Move to...
* g++.dg/coroutines/pr104051-0.C: ...here.
* g++.dg/coroutines/pr104051-1.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Daily bump.

c++, coroutines: Handle builtin_constant_p [PR116775].

Since the folding of this builtin happens after the main coroutine FE
lowering, we need to account for await expressions in that lowering.

Since these expressions have a property of being not evaluated, but do
not have the full constraints of an unevaluatated context, we want to
apply the checks and then remove the await expressions so that they no
longer participate in the analysis and lowering.

When a builtin_constant_p call is encountered, and the operand contains
any await expression, we check to see if the operand can be a constant
and replace the call with its result.

PR c++/116775

gcc/cp/ChangeLog:

* coroutines.cc (analyze_expression_awaits): When we see
a builtin_constant_p call, and that contains one or more
await expressions, then replace the call with its result
and discard the unevaluated operand.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116775.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines: Ensure that the resumer is marked as can_throw.

We must flag that the resumer might throw (since the wrapping of the
original function body unconditionally adds a try-catch/rethrow). We
also add code that might throw - even when the original function body
would not.

TODO: We could improve code-gen by recognising cases where the combined
body + initial await expressions cannot throw and omitting the unneeded
try/catch/rethrow wrapper.

gcc/cp/ChangeLog:

* coroutines.cc (build_actor_fn): Set can_throw.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

expand: Add a helper function for edge splitting [PR120629]

On Fri, Jun 13, 2025 at 08:52:55AM +0100, Richard Sandiford wrote:
> But now that there are two instances, I wonder if it would
> be worth hiding this detail in a helper function?

Here it is.

2025-06-13 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120629
* cfgexpand.cc (expand_split_edge): New function.
(expand_gimple_cond, construct_init_block): Use it.

aarch64: Fold NOT+PTEST to NOTS [PR118150]

Add combiner patterns for folding NOT+PTEST to NOTS when they share
the same governing predicate.

gcc/ChangeLog:
PR target/118150
* config/aarch64/aarch64-sve.md (*one_cmpl<mode>3_cc): New
combiner pattern.
(*one_cmpl<mode>3_ptest): Likewise.

gcc/testsuite/ChangeLog:
PR target/118150
* gcc.target/aarch64/sve/acle/general/not_1.c: New test.

libstdc++: Fix std::uninitialized_value_construct for arrays [PR120397]

The std::uninitialized_{value,default}_construct{,_n} algorithms should
be able to create arrays, but that currently fails because when an
exception happens they clean up using std::_Destroy and in C++17 that
doesn't support destroying arrays. (For C++20 and later, std::destroy
does handle destroying arrays.)

This commit adjusts the _UninitDestroyGuard RAII type used by those
algos so that in C++17 mode it recursively destroys each rank of an
array type, only using std::_Destroy for the last rank when it's
destroying non-array objects.

libstdc++-v3/ChangeLog:

PR libstdc++/120397
* include/bits/stl_uninitialized.h (_UninitDestroyGuard<I,void>):
Add new member function _S_destroy and call it from the
destructor (for C++17 only).
* testsuite/20_util/specialized_algorithms/uninitialized_default_construct/120397.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct/120397.cc:
New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Format %r, %x and %X using locale's time_put facet [PR120648]

Similarly to issue reported for %c in PR117214, the format string for locale
specific time (%r, %X) and date (%x) representations may contain specifiers
not accepted by chrono-spec, leading to exception being thrown. This
happened for following conversion specifier and locale combinations:
* %r, %X for aa_DJ.UTF-8, ar_SA.UTF-8
* %x for ca_AD.UTF-8, my_MM.UTF-8

This fix follows approach from r15-8490-gc24a1d5, and uses time_put to emit
localized date format. The existing _M_c is reworked to handle all locale
dependent conversion specifies, by accepting them as argument. It is also
renamed to _M_c_r_x_X.

PR libstdc++/120648

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Handle %c, %r, %x and %X by passing them to _M_c_r_x_X.
(__formatter_chrono::_M_c_r_x_X): Reworked from _M_c.
(__formatter_chrono::_M_c): Renamed into above.
(__formatter_chrono::_M_r, __formatter_chrono::_M_x)
(__formatter_chrono::_M_X): Removed.
* testsuite/std/time/format/pr117214.cc: New tests for %r, %x,
%X with date, time and durations.

libstdc++: Optimize __make_comp/pred_proj for empty/scalar types

When creating a composite comparator/predicate that invokes a given
projection function, we don't need to capture a scalar (such as a
function pointer or member pointer) or empty object by reference,
instead capture it by value and use [[no_unique_address]] to elide
its storage (in the empty case). This makes using __make_comp_proj
zero-cost in the common case where both functions are empty/scalars.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__by_ref_or_value_fn): New.
(__detail::_Comp_proj): New.
(__detail::__make_comp_proj): Use it instead.
(__detail::_Pred_proj): New.
(__detail::__make_pred_proj): Use it instead.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: add a workaround for format_kind<optional<T>> [PR120644]

The specialization of format_kind for optional is causing a problem when
optional is imported and included. The comments on the PR strongly
suggest that this is a frontend bug; this commit just works around the
issue by specifying the type of format_kind<optional<T>> to be
`range_format`, rather than leaving the compiler deduce it via `auto`.

PR c++/120644

libstdc++-v3/ChangeLog:

* include/std/optional (format_kind): Do not use `auto`.

testsuite: Fix pr119160.c for non-glibc targets [PR119862]

Testcase pr119160.c fails with symbol referencing errors for
`__cyg_profile_func_enter` and `__cyg_profile_func_exit` on non-glibc
systems.

This patch adds empty definitions for `__cyg_profile_func_enter`
and `__cyg_profile_func_exit` in order to prevent those errors.

PR testsuite/119862

gcc/testsuite/ChangeLog:

* gcc.dg/pr119160.c: Added empty definitions for
`__cyg_profile_func_enter` and `__cyg_profile_func_exit`
functions.

expand: Fix up edge splitting for ENTRY block during expansion if there are any PHIs [PR120629]

Andrew ran some extra ranger checking during bootstrap and found one more
case (though much rarer than the GIMPLE_COND case).

Seems on fold-const.cc (native_encode_expr) we end up with bb 2, ENTRY
bb successor, having PHI nodes (usually there is some bb in between, even if
empty, in the native_encode_expr it is tail recursion but haven't managed
to construct a test with such case by hand).
So, we have in optimized dump
  <bb 2> [local count: 1089340384]:
  # expr_12 = PHI <expr_199(D)(0), part_93(51)>
  # ptr_13 = PHI <ptr_86(D)(0), ptr_13(51)>
  # len_14 = PHI <len_103(D)(0), _198(51)>
  # off_10 = PHI <off_102(D)(0), _207(51)>
  # add_acc_99 = PHI <0(0), add_acc_101(51)>
where there are mostly default defs from the 0->2 edge (and one zero)
and some other values from the other edge.
construct_init_block inserts a BB_RTL basic block with the function start
instructions and similarly to the GIMPLE_COND case it wants to insert that
bb on the edge from ENTRY to its single successor.
Now, without this patch redirect_edge_succ redirects the 0->2 edge to 0->52,
so the 51->2 edge gets moved first by unordered_remove, and
make_single_succ_edge adds a new 52->2 edge.  So we end up with
  # expr_12 = PHI <expr_199(D)(51), part_93(52)>
  # ptr_13 = PHI <ptr_86(D)(51), ptr_13(52)>
  # len_14 = PHI <len_103(D)(51), _198(52)>
  # off_10 = PHI <off_102(D)(51), _207(52)>
  # add_acc_99 = PHI <0(51), add_acc_101(52)>
which is not correct, the default definitions and zero are now from the edge
from end of function and the other values from the edge from the new BB_RTL
successor of ENTRY.  With this patch we get
  # expr_12 = PHI <expr_199(D)(52), part_93(51)>
  # ptr_13 = PHI <ptr_86(D)(52), ptr_13(51)>
  # len_14 = PHI <len_103(D)(52), _198(51)>
  # off_10 = PHI <off_102(D)(52), _207(51)>
  # add_acc_99 = PHI <0(52), add_acc_101(51)>
instead.

2025-06-13  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120629
* cfgexpand.cc (construct_init_block): If first_block isn't BB_RTL,
has any PHI nodes and false_edge->dest_idx before redirection is
different from make_single_succ_edge result's dest_idx, swap the
latter with the former last pred edge and their dest_idx members.

libstdc++: Replace _CharT template parameter with CharT in format tests.

As pointed out by Daniel Krügler we do not need to use reserved name
in tests.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/bool/format.cc: Replaced _CharT
with CharT.
* testsuite/std/format/debug.cc: Likewise.
* testsuite/std/format/ranges/adaptors.cc: Likewise.
* testsuite/std/format/ranges/formatter.cc: Likewise.
* testsuite/std/format/ranges/map.cc: Likewise.
* testsuite/std/format/ranges/sequence.cc: Likewise.
* testsuite/std/format/ranges/string.cc: Likewise.
* testsuite/std/format/tuple.cc: Likewise.
* testsuite/std/time/format/empty_spec.cc: Likewise.
* testsuite/std/time/format/pr120114.cc: Likewise.
* testsuite/std/time/format/pr120481.cc: Likewise.
* testsuite/std/time/format/precision.cc: Likewise.

libstdc++: Rework formatting of empty chrono-spec for duration.

In contrast to other calendar types if empty chrono-spec is used for duration
we are required to format it (and its representation type) via ostream.
Handling this case was now moved to be part of the format function
for duration. To facilitate that __formatter_chrono::_M_format_to_ostream
function was made public.

However, for standard integral types, we know the result of inserting
them into ostream, and in consequence we can format them directly. This
is handled by configuring default format spec to "%Q%q" for such types.

As we no longer use __formatter_chrono::_M_format with empty chrono-spec,
this function now requires that _M_chrono_specs are not empty,
and conditional call to _M_format_to_ostream is removed. This allows
_M_format_to_ostream to be reduced to accept only duration.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format):
Remove handling of empty _M_chrono_specs.
(__formatter_chrono::_M_format_to_ostream): Changed to accept
only chrono::duration and made public.
(std::formatter<chrono::duration<_Rep, _Period>, _CharT>):
Configure __defSpec and handle empty chrono-spec locally.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Format empty chrono-spec for the sys_info and local_info directly.

This patch change implementation of the formatters for sys_info and local_info,
so they no longer delegate to operator<< for ostream in case of empty spec.
As this types may be only formatted with chrono-spec containing only %%, %t, %n
specifiers and fill characters, we use a separate __formatter_chrono_info formatter.

For empty chron-spec __formatter_chrono_info formats sys_info using format_to call
with format specifier extracted from corresponding operator<<, that now delegates
to format with empty spec. For local_info we replicate functionality of the operator<<.
The alignment and padding is handled using an _Padding_sink.

For non-empty spec, we delegate to __formatter_chrono::_M_format. As non-of the
format specifiers depends on the formatted object, we pass chrono::day to avoid
triggering additional specializations.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::__formatter_chrono_info)
[_GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI]: Define.
(std::formatter<chrono::sys_info, _CharT>)
(std::formatter<chrono::local_inf, _CharT>): Delegate to
__format::__formatter_chrono_info.
(std::operator<<(basic_ostream<_CharT, _Traits>& const sys_info&)):
Use format on sys_info with empty format spec.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Test chrono-spec containing only whitespaces.

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/whitespace.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

driver: Try to read spec from gcc_exec_prefix if possible

GCC will try to read the spec file from the directory where it is
installed, but it should try to read from gcc_exec_prefix rather than
standard_exec_prefix, because the latter is not the right one if
compiler has been relocated into other places other than the path
specfied at configuration time.

gcc/ChangeLog:

* gcc.cc (driver::set_up_specs): Use gcc_exec_prefix to
read the spec file rather than standard_exec_prefix.

mcore: Don't use gen_rtx_MEM on __attribute__((dllimport))

On mcore-elf, mcore_mark_dllimport generated

(gdb) call debug_tree (decl)
<function_decl 0x7fffe9941200 f1
    type <function_type 0x7fffe981f000
        type <void_type 0x7fffe98180a8 void VOID
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffe98180a8
            pointer_to_this <pointer_type 0x7fffe9818150>>
        HI
        size <integer_cst 0x7fffe9802738 constant 16>
        unit-size <integer_cst 0x7fffe9802750 constant 2>
        align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffe981f000
        arg-types <tree_list 0x7fffe980b988 value <void_type 0x7fffe98180a8 void>>
        pointer_to_this <pointer_type 0x7fffe991b0a8>>
    addressable used public external decl_5 SI /tmp/x.c:1:40 align:16 warn_if_not_align:0 context <translation_unit_decl 0x7fffe9955080 /tmp/x.c>
    attributes <tree_list 0x7fffe9932708
        purpose <identifier_node 0x7fffe9954000 dllimport>>
    (mem:SI (mem:SI (symbol_ref:SI ("@i.__imp_f1")) [0  S4 A32]) [0  S4 A32]) chain <function_decl 0x7fffe9941300 f2>>

which caused:

(gdb) bt
    file=0x2c0f1c8 "/export/gnu/import/git/sources/gcc-test/gcc/calls.cc",
    line=3746, function=0x2c0f747 "expand_call")
    at /export/gnu/import/git/sources/gcc-test/gcc/diagnostic.cc:1780
    target=0x0, ignore=1)
    at /export/gnu/import/git/sources/gcc-test/gcc/calls.cc:3746
...
(gdb) call debug_rtx (datum)
(mem:SI (symbol_ref:SI ("@i.__imp_f1")) [0  S4 A32])
(gdb)

Don't use gen_rtx_MEM in mcore_mark_dllimport to generate

(gdb) call debug_tree (fndecl)
<function_decl 0x7fffe9941200 f1
    type <function_type 0x7fffe981f000
        type <void_type 0x7fffe98180a8 void VOID
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffe98180a8
            pointer_to_this <pointer_type 0x7fffe9818150>>
        HI
        size <integer_cst 0x7fffe9802738 constant 16>
        unit-size <integer_cst 0x7fffe9802750 constant 2>
        align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffe981f000
        arg-types <tree_list 0x7fffe980b988 value <void_type 0x7fffe98180a8 void>>
        pointer_to_this <pointer_type 0x7fffe991b0a8>>
    addressable used public external decl_5 SI /tmp/x.c:1:40 align:16 warn_if_not_align:0 context <translation_unit_decl 0x7fffe9955080 /tmp/x.c>
    attributes <tree_list 0x7fffe9932708
        purpose <identifier_node 0x7fffe9954000 dllimport>>
    (mem:SI (symbol_ref:SI ("@i.__imp_f1")) [0  S4 A32]) chain <function_decl 0x7fffe9941300 f2>>
(gdb)

instead.  This fixes:

gcc.c-torture/compile/dll.c -O0 (internal compiler error: in assemble_variable, at varasm.cc:2544)
gcc.dg/visibility-12.c (internal compiler error: in expand_call, at calls.cc:3744)

for more-elf.

PR target/120589
* config/mcore/mcore.cc (mcore_mark_dllimport): Don't use
gen_rtx_MEM.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

Update gcc es.po

* es.po: Update.

recip: Reset range info when replacing sqrt with rsqrt [PR120638]

This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.

2025-06-12 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.

* gcc.dg/pr120638.c: New test.

testsuite: Add testcase for already fixed PR [PR120630]

These tests were broken by my r16-1398 PR120434 change and
fixed by r16-1482 PR120629 change.

Committing these to increase testsuite coverage.

2025-06-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/120630
* gcc.dg/pr120630.c: New test.
* gcc.c-torture/execute/pr120630.c: New test.

libstdc++: do not use an unreserved name in _Temporary_buffer [PR119496]

As the PR observes, _Temporary_buffer was using an unreserved name for a
member function that can therefore clash with macros defined by the
user. Avoid that by renaming the member function.

PR libstdc++/119496

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h: Adjust calls to requested_size.
* include/bits/stl_tempbuf.h (requested_size): Rename with
an _M_ prefix.
* testsuite/17_intro/names.cc: Add a #define for
requested_size.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>

libstdc++: add range support to std::optional (P3168)

This commit implements P3168 ("Give std::optional Range Support"), added
for C++26. Both begin() and end() are straightforward, implemented using
normal_iterator over a raw pointer.

std::optional is also a view, so specialize enable_view for it.

We also need to disable automatic formatting a std::optional as a range
by specializing format_kind. In order to avoid dragging <format> when
including <optional>, I've isolated format_kind and some supporting code
into <bits/formatfwd.h> so that I can use that (comparatively) lighter
header.

libstdc++-v3/ChangeLog:

* include/bits/formatfwd.h (format_kind): Move the definition
(and some supporting code) from <format>.
* include/std/format (format_kind): Likewise.
* include/bits/version.def (optional_range_support): Add
the feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/optional (iterator, const_iterator, begin, end):
Add range support.
(enable_view): Specialize for std::optional.
(format_kind): Specialize for std::optional.
* testsuite/20_util/optional/range.cc: New test.
* testsuite/20_util/optional/version.cc: Test the new
feature-testing macro.

or1k: Fix ICE in libgcc caused by recent validate_subreg changes

After commit eb2ea476db2 ("emit-rtl: Allow extra checks for
paradoxical subregs [PR119966]") paradoxical subregs or the OpenRISC
condition flag register (reg:BI sr_f) are no longer allowed.

This causes and ICE in the ce1 pass which tries to get the or1k flag
register into an SI register, which is no longer possible.

Adjust or1k_can_change_mode_class to allow changing the or1k flag reg to
SI mode which in turn allows paradoxical subregs to be generated again.

gcc/ChangeLog:

PR target/120587
* config/or1k/or1k.cc (or1k_can_change_mode_class): Allow
changing flags mode from BI to SI to allow for paradoxical
subregs.

libstdc++: Format empty chrono-spec for the time points and hh_mm_ss directly.

This patch change implementation of the formatters for time points and hh_mm_ss,
so they no longer delegate to operator<< for ostream in case of empty chrono-spec.
As in case of calendar types, the formatters for specific type now provide
__formatter_chrono with default _ChronoSpec that are used in case if empty
chrono-spec.

The configuration of __defSpec is straight forward, except for the sys_time,
and local_time that print time, if the duration is convertible to days,
which is equivalent to setting _M_chrono_specs "%F" instead of "%F %T".
Furthermore, certain sys_time<Dur> do not support ostream operator, and
should not be formattable with empty spec - in such case default
_M_chrono_spec, allowing the issue to still be detected in _M_parse.

Finally, _ChronoFormats are extended to cover required format strings.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_ChronoFormats::_S_ftz)
(_ChronoFormats::_S_ft, _ChronoFormats::_S_t): Define.
(__formatter_chrono::_M_format_to_ostream): Remove handling for
time_points.
(std::formatter<chrono::hh_mm_ss<_Dur>, _CharT>)
(std::formatter<chrono::sys_time<_Dur>, _CharT>)
(std::formatter<chrono::utc_time<_Dur>, _CharT>)
(std::formatter<chrono::tai_time<_Dur>, _CharT>)
(std::formatter<chrono::gps_time<_Dur>, _CharT>)
(std::formatter<chrono::file_time<_Dur>, _CharT>)
(std::formatter<chrono::local_time<_Dur>, _CharT>)
(std::formatter<chrono::__detail::__local_time_fmt<_Dur>, _CharT>)
(std::formatter<chrono::zoned_time<_Dur>, _CharT>):
Define __defSpec, and pass it as argument to _M_prase and
constructor of __formatter_chrono.

libstdc++: Format empty chrono-spec for the calendar types directly.

This patch change implementation of the formatters for the calendar types,
so they no longer delegate to operator<< for ostream in case of empty chrono-spec.
Instead of that, we define the behavior in terms of format specifiers
supplied by each formatter as an argument to _M_parse. Similarly each formatter
constructs its __formatter_chrono from a relevant default spec, preserving the
functionality of calling format on default constructed formatters.

Expressing the existing functionality of the operator ostream, requires
providing two additional features:
* printing "is not a valid sth" for !ok objects,
* printing a weekday index in the month.

The formatter functionality is enabled by setting spec _M_debug (corresponding
to '?') that is currently unused. This is currently supported only for
subset of format specifiers used by the ostream operators. In future, we could
make this user configurable (by adding '?' after 'L') and cover all flags.

For the handling of the weekday index (for weekday_indexed, month_weekday,
year_month_weekday), we need to introduce a new format specifier. To not
conflict with future extension we use '%\0' (embedded null) as this character
cannot be placed in valid format spec.

Finally, the format strings for calendar types subsets each other, e.g.
year_month_weekday_last ("%Y/%b/%a[last])" contains month_weekday_last,
weekday_last, weekday, e.t.c.. We introduce a _ChronoFormats class that provide
consteval accessors to format specs, internally sharing they representations.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::_ChronoFormats): Define.
(__formatter_chrono::__formatter_chrono())
(__formatter_chrono::__formatter_chrono(_ChronoSpec<_CharT>)): Define.
(__formatter_chrono::_M_parse): Add parameter with default spec,
and merge it with new values. Handle '%\0' as weekday index
specifier.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
(__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
(__formatter_chrono::_M_F): Support _M_debug flag.
(__formatter_chrono::_M_wi, __formatter_chrono::_S_weekday_index):
Define.
(std::formatter<chrono::day, _CharT>)
(std::formatter<chrono::month, _CharT>)
(std::formatter<chrono::year, _CharT>)
(std::formatter<chrono::weekday, _CharT>)
(std::formatter<chrono::weekday_indexed, _CharT>)
(std::formatter<chrono::weekday_last, _CharT>)
(std::formatter<chrono::month_day, _CharT>)
(std::formatter<chrono::month_day_last, _CharT>)
(std::formatter<chrono::month_weekday, _CharT>)
(std::formatter<chrono::month_weekday_last, _CharT>)
(std::formatter<chrono::year_month, _CharT>)
(std::formatter<chrono::year_month_day, _CharT>)
(std::formatter<chrono::year_month_day_last, _CharT>)
(std::formatter<chrono::year_month_weekday, _CharT>)
(std::formatter<chrono::year_month_weekday_last, _CharT>):
Define __defSpec, and pass it as argument to _M_parse and
constructor of __formatter_chrono.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

i386: Fix signed integer overflow in ix86_expand_int_movcc, part 2 [PR120604]

Make sure we can represent the difference between two 64-bit DImode immediate
values in 64-bit HOST_WIDE_INT and return false if this is not the case.

ix86_expand_int_movcc is used in mov<mode>cc expaner. Expander will FAIL
when the function returns false and middle-end will retry expansion with
values forced to registers.

PR target/120604

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_int_movcc): Make sure
we can represent the difference between two 64-bit DImode
immediate values in 64-bit HOST_WIDE_INT.

expand: Fix up edge splitting for GIMPLE_COND expansion if there are any PHIs [PR120629]

My r16-1398 PR120434 ranger during expansion change broke profiled lto
bootstrap on x86_64-linux, the following testcase is reduced from that.

The problem is during expand_gimple_cond, if we are unlucky that neither
of edge_true and edge_false point to the next basic block, the code
effectively attempts to split the false_edge and make the new bb BB_RTL
with some extra instructions which just arranges to jump.
It does it by creating a new bb, redirecting the false_edge and then
creating a new edge from the new bb to the dest.
Note, we don't have GIMPLE cfg hooks installed anymore and even if we
would, the 3 calls aren't the same as one split_edge with transformation
of the new bb into BB_RTL and adding it some BB_HEAD/BB_END.  If
false_edge->dest is BB_RTL or doesn't have PHI nodes (which before my
patch was always the case because), then this works fine, but with
PHI nodes on false_edge->dest redirect_edge_succ will remove the false_edge
from dest->preds (unordered remove which moves into its place the last edge
in the vector) and the new make_edge will then add the new edge as last
in the vector.  So, unless false_edge is the last edge in the dest->preds
vector this effectively swaps the last edge in the vector with
false_edge/its new replacement.
gimple_split_edge solves this by temporarily clearing phi_nodes on dest
(not needed when we don't have GIMPLE hooks), then making the new edge
first and redirecting the old edge (plus restoring phi_nodes on dest).
That way the redirection replaces the old edge with the new one and
PHI arguments don't need adjustment.  At the cost of temporarily needing
one more edge in the vector and so if unlucky reallocation.
Doing it like that is one of the options (i.e. just move the
make_single_succ_edge call).  This patch instead keeps doing what it did
and just swaps two edges again if needed to restore the PHI behavior
- remember edge_false->dest_idx first if there are PHI nodes in
edge_false->dest and afterwards if new edge's dest_idx is different from
the remembered one, swap the new edge with EDGE_PRED (dest, old_dest_idx).
That way PHI arguments are maintained properly as well.  Without this
we sometimes just swap PHI arguments.

In particular we had
  # ivtmp.24_52 = PHI <ivtmp.24_49(10), 1(6)>
on bb 8 (dest) and edge_false is the 10->8 edge.  We create a new
BB_RTL bb 15 on this edge, redirect the 10->8 edge to 10->15 which
does unordered_remove and so the bb8->preds edge vec is just 6->8,
PHIs not touched as in IR_RTL_CFGRTL mode.  Then a new 15->8 edge is
created.  Without the patch we get
  # ivtmp.24_52 = PHI <ivtmp.24_49(6), 1(15)>
which is wrong, while with this patch we get
  # ivtmp.24_52 = PHI <ivtmp.24_49(15), 1(6)>
which matches just the addition of (for ranger uninteresting) BB_RTL
on the 10->15->8 edge.

2025-06-12  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/120629
* cfgexpand.cc (expand_gimple_cond): If dest bb isn't BB_RTL,
has any PHI nodes and false_edge->dest_idx before redirection is
different from make_single_succ_edge result's dest_idx, swap the
latter with the former last pred edge and their dest_idx members.

* g++.dg/opt/pr120629.C: New test.

RISC-V: Add test for vec_dup + vmax.vv combine case 1 with max func 1 and GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx,
with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vmax.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_dup + vmax.vv combine case 1 with max func 0 and GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx,
with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vmax.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_dup + vmax.vv combine case 0 with max func 1 and GR2VR cost 0, 2 and 15

Add asm dump check test for vec_duplicate + vmax.vv combine to
vmax.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for max func 1 vmax.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_dup + vmax.vv combine case 0 with max func 0 and GR2VR cost 0, 2 and 15

Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx,
with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for max func 1 vmax.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmax.vv to vmax.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmax.vv to the
vmax.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)                                        \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = in[i] OP x;                                            \
  }

  DEF_VX_BINARY(int32_t, /)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vmax.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vmax.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case SMAX.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op smax.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Incorrect removal of ZA restore [PR120624]

The PCS defines a lazy save scheme for managing ZA across normal
"private-ZA" functions.  GCC currently uses this scheme for calls
to all private-ZA functions (rather than using caller-save).

Therefore, before a sequence of calls to private-ZA functions, GCC emits
code to set up a lazy save.  After the sequence of calls, GCC emits code
to check whether lazy save was committed and restore the ZA contents
if so.

These sequences are emitted by the mode-switching pass, in an attempt
to reduce the number of redundant saves and restores.

The lazy save scheme also means that, before a function can use ZA,
it must first conditionally store the old contents of ZA to the caller's
lazy save buffer, if any.

This all creates some relatively complex dependencies between
setup code, save/restore code, and normal reads from and writes to ZA.
These dependencies are modelled using special fake hard registers:

    ;; Sometimes we use placeholder instructions to mark where later
    ;; ABI-related lowering is needed.  These placeholders read and
    ;; write this register.  Instructions that depend on the lowering
    ;; read the register.
    (LOWERING_REGNUM 87)

    ;; Represents the contents of the current function's TPIDR2 block,
    ;; in abstract form.
    (TPIDR2_BLOCK_REGNUM 88)

    ;; Holds the value that the current function wants PSTATE.ZA to be.
    ;; The actual value can sometimes vary, because it does not track
    ;; changes to PSTATE.ZA that happen during a lazy save and restore.
    ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
    (SME_STATE_REGNUM 89)

    ;; Instructions write to this register if they set TPIDR2_EL0 to a
    ;; well-defined value.  Instructions read from the register if they
    ;; depend on the result of such writes.
    ;;
    ;; The register does not model the architected TPIDR2_ELO, just the
    ;; current function's management of it.
    (TPIDR2_SETUP_REGNUM 90)

    ;; Represents the property "has an incoming lazy save been committed?".
    (ZA_FREE_REGNUM 91)

    ;; Represents the property "are the current function's ZA contents
    ;; stored in the lazy save buffer, rather than in ZA itself?".
    (ZA_SAVED_REGNUM 92)

    ;; Represents the contents of the current function's ZA state in
    ;; abstract form.  At various times in the function, these contents
    ;; might be stored in ZA itself, or in the function's lazy save buffer.
    ;;
    ;; The contents persist even when the architected ZA is off.  Private-ZA
    ;; functions have no effect on its contents.
    (ZA_REGNUM 93)

Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
in order to sequence the code with the initial setup of ZA and
with the lazy save scheme.

The code to restore ZA after a call involves several instructions,
including conditional control flow.  It is initially represented as
a single define_insn and is split late, after shrink-wrapping and
prologue/epilogue insertion.

The split form of the restore instruction includes a conditional call
to __arm_tpidr2_restore:

(define_insn "aarch64_tpidr2_restore"
  [(set (reg:DI ZA_SAVED_REGNUM)
(unspec:DI [(reg:DI R0_REGNUM)] UNSPEC_TPIDR2_RESTORE))
   (set (reg:DI SME_STATE_REGNUM)
(unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE))
  ...
)

The write to SME_STATE_REGNUM indicates the end of the region where
ZA_REGNUM might differ from the real contents of ZA.  In other words,
it is the point at which normal reads from ZA and writes to ZA
can safely take place.

To finally get to the point, the problem in this PR was that the
unsplit aarch64_restore_za pattern was missing this change to
SME_STATE_REGNUM.  It could therefore be deleted as dead before
it had chance to be split.  The split form had the correct dataflow,
but the unsplit form didn't.

Unfortunately, the tests for this code tended to use calls and asms
to model regions of ZA usage, and those don't seem to be affected
in the same way.

gcc/
PR target/120624
* config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
SME_STATE_REGNUM

gcc/testsuite/
PR target/120624
* gcc.target/aarch64/sme/za_state_7.c: New test.

Daily bump.

libstdc++: Uglify __mapping_alike template parameter and fix test and typo in comment.

When the static assert was generated from instantiations of default member
initializer of class B, the error was not generated for B<1, std::layout_left,
std::layout_left> case, only when -D_GLIBCXX_DEBUG was set. Changing B calls to
functions fixes that.

We also replace class with typename in template head of layout_right::mapping
constructors.

libstdc++-v3/ChangeLog:

* include/std/mdspan (__mdspan::__mapping_alike): Rename template
parameter from M to _M_p.
(layout_right::mapping): Replace class with typename in template
head.
(layout_stride::mapping): Fix typo in comment.
* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc:
Changed B to function.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Make layout_left(layout_stride) noexcept.

[mdspan.layout.left.cons] of N4950 states that this ctor is not
noexcept. Since, all other ctors of layout_left, layout_right or
layout_stride are noexcept, the choice was made, based on
[res.on.exception.handling], to make this ctor noexcept.

Two other major standard library implementations make the same choice.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_left): Strengthen the exception
guarantees of layout_left::mapping(layout_stride::mapping).
* testsuite/23_containers/mdspan/layouts/ctors.cc:
Simplify tests to reflect the change.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Add tests for layout_stride.

Implements the tests for layout_stride and for the features of the other
two layouts that depend on layout_stride.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add test for
layout_stride and the interaction with other layouts.
* testsuite/23_containers/mdspan/layouts/empty.cc: Ditto.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Ditto.
* testsuite/23_containers/mdspan/layouts/stride.cc: New test.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>