git.ipfire.org Git - thirdparty/gcc.git/log

Revert "Fix _Decimal128 arithmetic error under FE_UPWARD."

This reverts commit 50064b2898edfb83bc37f2597a35cbd3c1c853e3.

Daily bump.

PR modula2/121709: Failed bootstrap in m2

This patch is a followup to PR modula2/121629 which uses
the cpp_include_defaults array to configure the default search path
entries. In particular it creates default search paths
based on LOCAL_INCLUDE_DIR, PREFIX_INCLUDE_DIR, gcc version path
and NATIVE_SYSTEM_HEADER_DIR.

gcc/m2/ChangeLog:

PR modula2/121709
* gm2-lang.cc (concat_component): New function.
(find_cpp_entry): Ditto.
(lookup_cpp_default): Ditto.
(add_default_include_paths): Rewrite.
(m2_pathname_root): Remove.

gcc/ChangeLog:

PR modula2/121709
* doc/gm2.texi (Module Search Path): Reflect the new
search order.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: array subscript with COND_EXPR as the array

The following minimum reproducer would miscompile with vanilla gcc:

  extern int x[10], y[10];
  bool g();
  void f() { 0[g() ? x : y] = 1; }

gcc would mistakenly treat the subexpression (g() ? x : y) as a prvalue and
move that array to stack. The following assignment would then write to the
stack instead of to the global arrays. When optimizations are enabled, this
assignment is discarded by dse and gcc generates the following code for the
f function:

  "_Z1fi":
        jmp     "_Z1gv"

The miscompilation requires all the following conditions to be met:

  - The array subscription expression is written as idx[array], instead of
    the usual form array[idx];
  - The "array" part must be a ternary expression (COND_EXPR in gcc tree)
    and it must be an lvalue.
  - The code must be compiled with -fstrong-eval-order which is the default
    for -std=c++17 or later.

The cause of the issue lies in cp_build_array_ref, where it mistakenly
generates a COND_EXPR with ARRAY_TYPE to the IL when all the criteria above
are met. This patch tries to resolve this issue. It moves the
canonicalization step that transforms idx[array] to array[idx] early in
cp_build_array_ref to ensure we handle these two forms of array subscription
consistently.

Tested on x86_64-linux.

gcc/cp/ChangeLog:

* typeck.cc (cp_build_array_ref): Handle 0[arr] earlier.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/array-condition-expr.C: New test.

Signed-off-by: Sirui Mu <msrlancern@gmail.com>

diagnostics: add GCC_DIAGNOSTICS_LOG

Whilst experimenting with PR diagnostics/121039 (potentially capturing
suppressed diagnostics in SARIF output), I found it very useful to have
a text log from the diagnostic subsystem to track what it's doing and
the decisions it's making (e.g. exactly when and why a diagnostic is
being rejected).

This patch adds a simple logging mechanism to the diagnostics subsystem,
enabled by setting GCC_DIAGNOSTICS_LOG in the environment, which emits
nested text like this to stderr (or a named file):

warning (option_id: 668, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
  diagnostics::context::diagnostic_impl (option_id: 668, kind: warning, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
    diagnostics::context::report_diagnostic
    rejecting: diagnostic not enabled
    false <- diagnostics::context::diagnostic_impl
  false <- warning

This logging mechanism doesn't use pretty_printer because it can be
helpful to use it to debug pretty_printer itself.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostics/logging.o.
* diagnostic-global-context.cc: Include "diagnostics/logging.h".
(log_function_params, auto_inc_log_depth): New "using" decls.
(verbatim): Add logging.
(emit_diagnostic): Likewise.
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
(inform): Likewise.
(inform_n): Likewise.
(warning): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(warning_n): Likewise.
(pedwarn): Likewise.
(permerror): Likewise.
(permerror_opt): Likewise.
* diagnostics/context.cc: Include "diagnostics/logging.h".
(context::initialize): Initialize m_logger.  Add logging.
(context::finish): Add logging.  Clean up m_logger.
(context::dump): Add indent param.
(context::set_sink): Add logging.
(context::add_sink): Add logging.
(diagnostic_kind_debug_text): New.
(get_debug_string_for_kind): New.
(context::report_diagnostic): Add logging.
(context::diagnostic_impl): Likewise.
(context::diagnostic_n_impl): Likewise.
(context::end_group): Likewise.
* diagnostics/context.h: Include "diagnostics/logging.h".
(context::dump): Add indent param.
(context::get_logger): New accessor.
(context::classify_diagnostics): Add logging.
(context::push_diagnostics): Likewise.
(context::pop_diagnostics): Likewise.
(context::m_logger): New field.
* diagnostics/html-sink.cc: Include "diagnostics/logging.h".
(html_builder::flush_to_file): Add logging.
(html_sink::on_report_diagnostic): Likewise.
* diagnostics/kinds.h (get_debug_string_for_kind): New decl.
* diagnostics/logging.cc: New file.
* diagnostics/logging.h: New file.
* diagnostics/output-file.h: Include "label-text.h".
* diagnostics/sarif-sink.cc: Include "diagnostics/logging.h".
(sarif_builder::flush_to_object): Add logging.
(sarif_builder::flush_to_file): Likewise.
(sarif_sink::on_report_diagnostic): Likewise.
* diagnostics/sink.h (sink::get_logger): New.
* diagnostics/text-sink.cc: Include "diagnostics/logging.h".
(text_sink::on_report_diagnostic): Add logging.
* doc/invoke.texi (Environment Variables): Document
GCC_DIAGNOSTICS_LOG.
* opts-diagnostic.cc: Include "diagnostics/logging.h".
(handle_OPT_fdiagnostics_add_output_): Add loggging.
(handle_OPT_fdiagnostics_set_output_): Likewise.

gcc/analyzer/ChangeLog:
* pending-diagnostic.cc: Include "diagnostics/logging.h".
(diagnostic_emission_context::warn): Add logging.
(diagnostic_emission_context::inform): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

xtensa: Rewrite bswapsi2_internal with compact syntax

Also, the omission of the instruction that sets the shift amount register
(SAR) to 8 is now more efficient: it is omitted if there was a previous
bswapsi2 in the same BB, but not omitted if no bswapsi2 is found or another
insn that modifies SAR is found first (see below).

Note that the five instructions for writing to SAR are as follows, along
with the insns that use them (except for bswapsi2_internal itself):

- SSA8B
    *shift_per_byte, *shlrd_per_byte
- SSA8L
    *shift_per_byte, *shlrd_per_byte
- SSR
    ashrsi3 (alt 1), lshrsi3 (alt 1), *shlrd_reg, rotrsi3 (alt 1)
- SSL
    ashlsi3_internal (alt 1), *shlrd_reg, rotlsi3 (alt 1)
- SSAI
    *shlrd_const, rotlsi3 (alt 0), rotrsi3 (alt 0)

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h (xtensa_bswapsi2_output):
New function prototype.
* config/xtensa/xtensa.cc
(xtensa_bswapsi2_output_1, xtensa_bswapsi2_output):
New functions.
* config/xtensa/xtensa.md (bswapsi2_internal):
Rewrite in compact syntax and use xtensa_bswapsi2_output() as asm
output.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/bswap-SSAI8.c: New.

[RISC-V][PR target/121548] Avoid bogus index into recog operand cache

So the RISC-V port has attributes which indicate the index within the
recog_data where certain operands will be found.

For this BZ the default value for the merge_op_idx attribute on the given insn
is "2".  But the insn only has operands 0 & 1.  So we do an out of bounds array
access and boom the ICE/valgrind failure.

As we discussed in the patchwork meeting, this is all a bit clunky and has been
fairly error prone.  This doesn't add any massive checking, but does introduce
some asserts to help catch problems a bit earlier and clearer.

In particular in cases where we're already asserting that the returned index is
valid (!= INVALID_ATTRIBUTE) we also assert that the index is less than the
total number of operands.

In the get_vlmax_ta_preferred_avl routine it appears like we need to handle
these two cases more gracefully as we apparently legitimately query for the
merge_op_idx on a fairly arbitrary insn.  We just have to make sure to not
*use* the result if it's INVALID_ATTRIBUTE.  So for that code we assert that
merge_op_idx is either INVALID_ATTRIBUTE or smaller than the number of
operands.

This patch also adds overrides for 3 patterns to return INVALID_ATTRIBUTE for
merge_op_idx, similar to how they already do for mode_idx and avl_type_idx.

This has been bootstrapped and regression tested on the bpi & pioneer systems
and regression tested for riscv32-elf and riscv64-elf.  Waiting on CI before
pushing.

PR target/121548
gcc/
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): Assert
MODE_IDX is smaller than the number of operands.
(simplify_replace_vlmax_avl): Similarly.
(pass_avlprop::get_vlmax_ta_preferred_avl): Similarly.
* config/riscv/vector.md: Override merge_op_idx computation
for simple moves, just like is done for avl_type_idx and mode_idx.

Fortran: improve compile-time checking of character dummy arguments [PR93330]

PR fortran/93330

gcc/fortran/ChangeLog:

* interface.cc (get_sym_storage_size): Add argument size_known to
indicate that the storage size could be successfully determined.
(get_expr_storage_size): Likewise.
(gfc_compare_actual_formal): Use them to handle zero-sized dummy
and actual arguments.
If a character formal argument has the pointer or allocatable
attribute, or is an array that is not assumed or explicit size,
we generate an error by default unless -std=legacy is specified,
which falls back to just giving a warning.
If -Wcharacter-truncation is given, warn on a character actual
argument longer than the dummy. Generate an error for too short
scalar character arguments if -std=f* is given instead of just a
warning.

gcc/testsuite/ChangeLog:

* gfortran.dg/argument_checking_15.f90: Adjust dg-pattern.
* gfortran.dg/bounds_check_strlen_7.f90: Add dg-pattern.
* gfortran.dg/char_length_3.f90: Adjust options.
* gfortran.dg/whole_file_24.f90: Add dg-pattern.
* gfortran.dg/whole_file_29.f90: Likewise.
* gfortran.dg/argument_checking_27.f90: New test.

RISC-V: Add patterns for vector-scalar IEEE floating-point min

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an unspec_vfmin RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmin.vv       v1,v1,v2

After, we get only one:
  vfmin.vf       v1,v1,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md
(*vfmin_vf_ieee_<mode>): Add new patterns to combine vec_duplicate +
vfmin.vv (unspec) into vfmin.vf.
(*vfmul_vf_<mode>, *vfrdiv_vf_<mode>, *vfmin_vf_<mode>): Fix attribute
types.
* config/riscv/vector.md (@pred_<ieee_fmaxmin_op><mode>_scalar): Allow
VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Add vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f64.c: New test.

x86: Allow by_pieces op when expanding memcpy/memset epilogue

Since

commit 401199377c50045ede560daf3f6e8b51749c2a87
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Jun 17 10:17:17 2025 +0800

    x86: Improve vector_loop/unrolled_loop for memset/memcpy

uses move_by_pieces and store_by_pieces to expand memcpy/memset epilogue
with vector_loop even when targetm.use_by_pieces_infrastructure_p returns
false, which triggers

  gcc_assert (targetm.use_by_pieces_infrastructure_p
                (len, align,
                 memsetp ? SET_BY_PIECES : STORE_BY_PIECES,
                 optimize_insn_for_speed_p ()));

in store_by_pieces.  Fix it by:

1. Add by_pieces_in_use to machine_function to indicate that by_pieces op
is currently in use.
2. Set and clear by_pieces_in_use when expanding memcpy/memset epilogue
with move_by_pieces and store_by_pieces.
3. Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P to return true if
by_pieces_in_use is true.

gcc/

PR target/121096
* config/i386/i386-expand.cc (expand_cpymem_epilogue): Set and
clear by_pieces_in_use when using by_pieces op.
(expand_setmem_epilogue): Likewise.
* config/i386/i386.cc (ix86_use_by_pieces_infrastructure_p): New.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Likewise.
* config/i386/i386.h (machine_function): Add by_pieces_in_use.

gcc/testsuite/

PR target/121096
* gcc.target/i386/memcpy-strategy-14.c: New test.
* gcc.target/i386/memcpy-strategy-15.c: Likewise.
* gcc.target/i386/memset-strategy-10.c: Likewise.
* gcc.target/i386/memset-strategy-11.c: Likewise.
* gcc.target/i386/memset-strategy-12.c: Likewise.
* gcc.target/i386/memset-strategy-13.c: Likewise.
* gcc.target/i386/memset-strategy-14.c: Likewise.
* gcc.target/i386/memset-strategy-15.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Handle constant in any modes in setmem_epilogue_gen_val

Since the constant passed to setmem_epilogue_gen_val may not be in
word_mode, update setmem_epilogue_gen_val to handle any integer modes.

gcc/

PR target/121108
* config/i386/i386-expand.cc (setmem_epilogue_gen_val): Don't
assert op_mode == word_mode and handle any integer modes.

gcc/testsuite/

PR target/121108
* gcc.target/i386/memset-strategy-16.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86-64: Improve source operand check for TLS_CALL

Source operands of 2 TLS_CALL patterns in

(insn 10 9 11 3 (set (reg:DI 100)
        (unspec:DI [
                (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
            ] UNSPEC_TLSDESC)) "x.c":7:16 1674 {*tls_dynamic_gnu2_lea_64_di}
     (nil))
(insn 11 10 12 3 (parallel [
            (set (reg:DI 99)
                (unspec:DI [
                        (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
                        (reg:DI 100)
                        (reg/f:DI 7 sp)
                    ] UNSPEC_TLSDESC))
            (clobber (reg:CC 17 flags))
        ]) "x.c":7:16 1676 {*tls_dynamic_gnu2_call_64_di}
     (expr_list:REG_DEAD (reg:DI 100)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

and

(insn 19 17 20 4 (set (reg:DI 104)
        (unspec:DI [
                (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
            ] UNSPEC_TLSDESC)) "x.c":6:10 discrim 1 1674 {*tls_dynamic_gnu2_lea_64_di}
     (nil))
(insn 20 19 21 4 (parallel [
            (set (reg:DI 103)
                (unspec:DI [
                        (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
                        (reg:DI 104)
                        (reg/f:DI 7 sp)
                    ] UNSPEC_TLSDESC))
            (clobber (reg:CC 17 flags))
        ]) "x.c":6:10 discrim 1 1676 {*tls_dynamic_gnu2_call_64_di}
     (expr_list:REG_DEAD (reg:DI 104)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

are the same even though rtx_equal_p returns false since (reg:DI 100)
and (reg:DI 104) are set from the same symbol.  Use the UNSPEC_TLSDESC
symbol

(unspec:DI [(symbol_ref:DI ("caml_state") [flags 0x10])] UNSPEC_TLSDESC))

to check if 2 TLS_CALL patterns have the same source.

For TLS64_COMBINE, use both UNSPEC_TLSDESC and UNSPEC_DTPOFF unspecs to
check if 2 TLS64_COMBINE patterns have the same source.

gcc/

PR target/121694
* config/i386/i386-features.cc (redundant_pattern): Add
tlsdesc_val.
(pass_x86_cse): Likewise.
(pass_x86_cse::tls_set_insn_from_symbol): New member function.
(pass_x86_cse::candidate_gnu2_tls_p): Set tlsdesc_val.  For
TLS64_COMBINE, match both UNSPEC_TLSDESC and UNSPEC_DTPOFF
symbols.  For TLS64_CALL, match the UNSPEC_TLSDESC sumbol.
(pass_x86_cse::x86_cse): Initialize the tlsdesc_val field in
load.  Pass the tlsdesc_val field to ix86_place_single_tls_call
for X86_CSE_TLSDESC.

gcc/testsuite/

PR target/121694
* gcc.target/i386/pr121668-1b.c: New test.
* gcc.target/i386/pr121694-1a.c: Likewise.
* gcc.target/i386/pr121694-1b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

c++: -fimplicit-constexpr testcase tweak

If B::get is (implictly or explicitly) constexpr the individual b bindings
have constant initialization and get optimized away, so their symbols don't
appear in the assembly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/decomp26.C: Add -fimplicit-constexpr.

invoke.texi: AMD GCN - remove '(experimental)' from some gfx*-generic

GCC added generic support in r15-7406-gb5a29a93ee29a8 (Feb 2025) with an
'(experimental)' marker, also because ROCm only supported it in their
git repository and not in a released version. Since ROCm 6.4 (Apr 2025),
generic is also supported in released ROCm versions - and has been
meanwhile tested by us.

For architectures that have a well tested architecture, there is no
reason that a binary, compiled for the associated generic architecture,
performs any different to the specific version. Hence, this commit
removes the marker for gfx-9-generic (gfx900, gfx906, gfx90c are known
to work specific architectures), gfx10-3-generic (likewise for gfx1030
and gfx1036), and gfx11-generic (gfx1100 and gfx1103).

gcc/ChangeLog:

* doc/invoke.texi (AMD GCN Options: -march): Remove '(experimental)'
from gfx-{9,10-3,11}-generic.

install.texi: For amdgcn, clarify which llvm-* binaries are required

Also remove future tense for ROCm as 6.4.0 has been released in April 2025
and it supports generic architectures.

gcc/ChangeLog:

* doc/install.texi (amdgcn): Clarify which binaries must be the
LLVM version and which must be installed. Update version data for
ROCm for generic architectures.

i386: Fix vect-pragma-target-[12].c testcase for -march=XYZ [PR120643]

These 2 testcases were originally designed for the default -march= of
x86_64 so if you pass -march=native (on a target with AVX512 enabled),
they will fail. It fix this, we add `-mno-sse3 -mtune=generic`
to the options to force a specific arch to the testcase.

Changes since v1:
* v2: Use -mtune=generic instead of -mprefer-vector-width=512.

Tested on a skylake-avx512 machine with -march=native.

PR testsuite/120643
gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-pragma-target-1.c: Add `-mno-sse3 -mtune=generic`
to the options.
* gcc.target/i386/vect-pragma-target-2.c: Likewise.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

aarch64/testsuite: Fix vld2-1.c after r16-3201 [PR121713]

After r16-3201-gee67004474d521, this testcase started to fail as
we can copy prop into arguments now so the number of "after previous"
check has doubled.

Pushed after a quick check to make sure the testcase is now passing.

PR testsuite/121713
gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vld2-1.c: Update the number of "after previous"
checks.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Document -param=ix86-vect-unroll-limit.

gcc/ChangeLog:

* doc/invoke.texi: Document -param=ix86-vect-unroll-limit.

RISC-V: Add test for vec_duplicate + vnmsac.vv unsigned combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vnmsac.vv signed combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vnmsac.vv to vnmsac.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vnmsac.vv to the
vnmsac.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_TERNARY_CASE_0(T, OP_1, OP_2, NAME)
\
  void
\
  test_vx_ternary_##NAME##_##T##_case_0 (T * restrict vd, T * restrict
vs2, \NAME                                         T rs1, unsigned n)
\
  {
\
    for (unsigned i = 0; i < n; i++)
\
      vd[i] = vd[i] OP_2 vs2[i] OP_1 rs1;
\
  }

  DEF_VX_TERNARY_CASE_0(int32_t, *, +, macc)

Before this patch:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  ...
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  22   │     vnmsac.vv v1,v2,v3
  ...
  25   │     bne a3,zero,.L3

After this patch:
  11   │     beq a3,zero,.L8
  ...
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  20   │     vnmsac.vx v1,a2,v3
  ...
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Add new
pattern to combine to vx.
* config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Add new
pattern to generate rtl.
(*pred_nmsac_<mode>_scalar_undef): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

Fix _Decimal128 arithmetic error under FE_UPWARD.

libgcc/config/libbid/ChangeLog:

PR target/120691
* bid128_div.c: Fix _Decimal128 arithmetic error under
FE_UPWARD.
* bid128_rem.c: Ditto.
* bid128_sqrt.c: Ditto.
* bid64_div.c (bid64_div): Ditto.
* bid64_sqrt.c (bid64_sqrt): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr120691.c: New test.

Daily bump.

fixincludes: Skip pthread_incomplete_struct_argument for modern glibc [PR118009]

The pthread_incomplete_struct_argument fix was intended for ancient
versions of Glibc (only 2.3.3 and 2.3.4, I believe). From Glibc 2.3.5
the pthread.h header already included the change to use a pointer
instead of an array, so the fixinclude was no longer used.

However, the https://sourceware.org/bugzilla/show_bug.cgi?id=26647 fix
changed the __setjmpbuf declaration to use struct __jmp_buf_tag __env[1]
again, which caused this fixinclude to start matching again. This means
that GCC now installs a "fixed" pthread.h with a change to a declaration
that guarded by #if ! __GNUC_PREREQ (11, 0), i.e. it's not even relevant
for modern versions of GCC. The "fixed" pthread.h causes problems for
users because of changes to internal implementation details of the
pthread_cond_t type, which require the "fixed" pthread.h to be updated
with mkheaders if Glibc is updated.

This change adds a bypass to the fixinclude, so that it no longer
matches modern Glibc versions, and only applies to glibc versions 2.3.3
and 2.3.4 as originally intended.

Also remove outdated reference to svn in the comment at the top of the
generated file.

fixincludes/ChangeLog:

PR bootstrap/118009
PR bootstrap/119089
* inclhack.def (pthread_incomplete_struct_argument): Add bypass.
* fixincl.tpl: Remove reference to svn in comment.
* fixincl.x: Regenerate.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Implement C++26 <debugging> features [PR119670]

This implements P2546R5 (Debugging Support), including the P2810R4
(is_debugger_present is_replaceable) changes, allowing
std::is_debugger_present to be replaced by the program.

It would be good to provide a macOS definition of is_debugger_present as
per https://developer.apple.com/library/archive/qa/qa1361/_index.html
but that isn't included in this change.

The src/c++26/debugging.cc file defines a global volatile int which can
be set by debuggers to indicate when they are attached and detached from
a running process. This allows std::is_debugger_present() to give a
reliable answer, and additionally allows a debugger to choose how
std::breakpoint() should behave. Setting the global to a positive value
will cause std::breakpoint() to use that value as an argument to
std::raise, so debuggers that prefer SIGABRT for breakpoints can select
that. By default std::breakpoint() will use a platform-specific action
such as the INT3 instruction on x86, or GCC's __builtin_trap().

On Linux the std::is_debugger_present() function checks whether the
process is being traced by a process named "gdb", "gdbserver" or
"lldb-server", to try to avoid interpreting other tracing processes
(such as strace) as a debugger. There have been comments suggesting this
isn't desirable and that std::is_debugger_present() should just return
true for any tracing process (which is the case for non-Linux targets
that support the ptrace system call).

libstdc++-v3/ChangeLog:

PR libstdc++/119670
* acinclude.m4 (GLIBCXX_CHECK_DEBUGGING): Check for facilities
needed by <debugging>.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_DEBUGGING.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (debugging): Add.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Add new header.
* src/c++26/Makefile.am: Add new file.
* src/c++26/Makefile.in: Regenerate.
* include/std/debugging: New file.
* src/c++26/debugging.cc: New file.
* testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
* testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
New test.
* testsuite/19_diagnostics/debugging/is_debugger_present.cc: New
test.
* testsuite/19_diagnostics/debugging/is_debugger_present-2.cc:
New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++: > in lambda in template arg [PR107953]

As with PR116928, we need to set greater_than_is_operator_p within the
lambda delimiters.

PR c++/107953

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_expression): Set
greater_than_is_operator_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ18.C: New test.

passes: Move cleanup_eh before first tailr [PR115201]

So the current pass order is:
```
          NEXT_PASS (pass_tail_recursion);
          NEXT_PASS (pass_if_to_switch);
          NEXT_PASS (pass_convert_switch);
          NEXT_PASS (pass_cleanup_eh);
```
But nothing in if_to_switch nor convert_switch will change the IR
such that cleanup eh will take into account.
tail_recusion benifits the most by not having "almost" empty landing pads.
This order was originally done when cleanup_eh was added in r0-92178-ga8da523f8a442f
but it looks like it was just done just before inlining rather than thinking it
could improve passes before hand.

An example where this helps is PR 115201 where we have:
```
;;   basic block 5, loop depth 0, maybe hot
;;    prev block 4, next block 6, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 (TRUE_VALUE,EXECUTABLE)
  [LP 1] # .MEM_19 = VDEF <.MEM_45>
  # USE = nonlocal escaped
  # CLB = nonlocal escaped
  D.4770 = _Z12binarySearchIi2itIiEET0_RKT_S2_S2_D.4690 (item_15(D), startD.4711, midD.4717);
  goto <bb 7>; [INV]
;;    succ:       8 (EH,EXECUTABLE)
;;                7 (FALLTHRU,EXECUTABLE)
...

;;   basic block 8, loop depth 0, maybe hot
;;    prev block 7, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:       5 (EH,EXECUTABLE)
;;                6 (EH,EXECUTABLE)
  # .MEM_7 = PHI <.MEM_19(5), .MEM_18(6)>
<L6>: [LP 1]
  # .MEM_20 = VDEF <.MEM_7>
  midD.4717 ={v} {CLOBBER(eos)};
  resx 1
;;    succ:
```

As you can see the empty landing pad should be able to remove away and
then a tail recursion can happen.

Bootstrapped and tested x86_64-linux-gnu.

PR tree-optimization/115201
gcc/ChangeLog:

* passes.def: Move cleanup_eh before first tail_recursion.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

MAINTAINERS: add myself to write after approval

ChangeLog:

* MAINTAINERS: add myself to write after approval

RISC-V: Add pattern for vector-scalar floating-point min

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an smin RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmin.vv       v1,v1,v2

After, we get only one:
  vfmin.vf       v1,v1,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfmin_vf_<mode>): Add new pattern to
combine vec_duplicate + vfmin.vv into vfmin.vf.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for
function variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f64.c: New test.

Dump niter assumption versioning when vectorizing

The following emits the assumption that is used for versioning from
niter analysis.

* tree-vect-loop.cc (vect_analyze_loop_form): Dump
niter assumption used for versioning.

AArch64: Add isinf expander [PR 66462]

Add an expander for isinf using integer arithmetic. This is
typically faster and avoids generating spurious exceptions on
signaling NaNs. This fixes part of PR66462.

int isinf1 (float x) { return __builtin_isinf (x); }

Before:
fabs s0, s0
mov w0, 2139095039
fmov s31, w0
fcmp s0, s31
cset w0, le
eor w0, w0, 1
ret

After:
fmov w1, s0
mov w0, -16777216
cmp w0, w1, lsl 1
cset w0, eq
ret

gcc:
PR middle-end/66462
* config/aarch64/aarch64.md (isinf<mode>2): Add new expander.
* config/aarch64/iterators.md (mantissa_bits): Add new mode_attr.

gcc/testsuite:
PR middle-end/66462
* gcc.target/aarch64/pr66462.c: Add new test.

libstdc++: Test comparing ordering with type convertible to any pointer.

libstdc++-v3/ChangeLog:

* testsuite/18_support/comparisons/categories/zero_neg.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

Compute reduction var in vectorize_fold_left_reduction

Instead of going via the PHI node accessible through the reduc-dec
link, use the scalar def of the reduction SLP node. Compute this
in vectorize_fold_left_reduction itself.

* tree-vect-loop.cc (vectorize_fold_left_reduction): Do not get
reduc_var as argument, instead compute it here.
(vect_transform_reduction): Adjust.

libstdc++: Remove implicit type conversions in std::complex

The current implementation of `complex<_Tp>` assumes that int
`int` is implicitly convertible to `_Tp`, e.g., when using
`complex<_Tp>(1)`.

This patch transforms the implicit conversions into explicit type casts.
As a result, `std::complex` is now able to support more types. One
example is the type `Eigen::Half` from
https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
implement implicit type conversions.

libstdc++-v3/ChangeLog:

* include/std/complex (polar, __complex_sqrt, pow)
(__complex_pow_unsigned): Use explicit conversions from int to
the complex value_type.

libstdc++: Constrain bitset(const CharT*) constructor [PR121046]

Asking std::is_constructible_v<std::bitset<1>, NonTrivial*> gives an
error, rather than answering the query. The problem is that the
constructor for std::bitset("010101") is not constrained to only accept
pointers to char-like types, and for the second parameter (which has a
default argument) std::basic_string_view<CharT> gets instantiated. If
the type is not char-like then that has undefined behaviour, and might
trigger a static_assert to fail in the body of std::basic_string_view.

We can fix it by constraining that constructor using the requirements
for char-like types from [strings.general] p1. I've submitted LWG 4294
and proposed making this change in the standard.

libstdc++-v3/ChangeLog:

PR libstdc++/121046
* include/std/bitset (bitset(const CharT*, ...)): Add
constraints on CharT type.
* testsuite/23_containers/bitset/lwg4294.cc: New test.

libstdc++: Provide helpers to interoperate between __cmp_cat::_Ord and ordering types.

This patch adds two new internal helpers for ordering types:
* __cmp_cat::__ord to retrieve an internal _Ord value,
* __cmp_cat::__make<Ordering> to create an ordering from an _Ord value.

Conversions between ordering types are now handled by __cmp_cat::__make. As a
result, ordering types no longer need to befriend each other, only the new
helpers.

The __fp_weak_ordering implementation has also been simplified by:
* using the new helpers to convert partial_ordering to weak_ordering,
* using strong_ordering to weak_ordering conversion operator,
for the __isnan_sign comparison,
* removing the unused __cat local variable.

Finally, the _Ncmp enum is removed, and the unordered enumerator is added
to the existing _Ord enum.

libstdc++-v3/ChangeLog:

* libsupc++/compare (__cmp_cat::_Ord): Add unordered enumerator.
(__cmp_cat::_Ncmp): Remove.
(__cmp_cat::__ord, __cmp_cat::__make): Define.
(partial_ordering::partial_ordering(__cmp_cat::_Ncmp)): Remove.
(operator<=>(__cmp_cat::__unspec, partial_ordering))
(partial_ordering::unordered): Replace _Ncmp with _Ord.
(std::partial_ordering, std::weak_ordering, std::strong_ordering):
Befriend __ord and __make helpers, remove friend declartions for
other orderings.
(__compare::__fp_weak_ordering): Remove unused __cat variable.
Simplify ordering conversions.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++/modules: Add explanatory note for incomplete types with definition in different module [PR119844]

The confusion in the PR arose because the definition of 'User' in a
separate named module did not provide an implementation for the
forward-declaration in the global module. This seems likely to be a
common mistake while people are transitioning to modules, so this patch
adds an explanatory note.

While I was looking at this I also noticed that the existing handling of
partial specialisations for this note was wrong (we pointed at the
primary template declaration rather than the relevant partial spec), so
this patch fixes that up, and also gives a more precise error message
for using a template other than by self-reference while it's being
defined.

PR c++/119844

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_inform): Add explanation when
a similar type is complete but attached to a different module.
Also fix handling of partial specs and templates.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr119844_a.C: New test.
* g++.dg/modules/pr119844_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

PR modula2/121629: adding third party modules

This patch makes it easier to add third party modules.
cc1gm2 now appends the search directory prefix/include/m2
to the search path for non dialect specific modules.
Prior to this it appends the dialect specific subdirectories
{m2pim,m2iso,m2log,m2min} with the appropriate dialect pathname.
The patch also includes a new option -fm2-pathname-root=prefix
which allow additional prefix/m2 directories to be searched
before the default.

gcc/ChangeLog:

PR modula2/121629
* doc/gm2.texi (Module Search Path): New section.
(Compiler options): New option -fm2-pathname-root=.
New option -fm2-pathname-rootI.

gcc/m2/ChangeLog:

PR modula2/121629
* gm2-compiler/PathName.mod: Add copyright notice.
* gm2-lang.cc (named_path): Add field lib_root.
(push_back_Ipath): Set lib_root false.
(push_back_lib_root): New function.
(get_dir_sep_size): Ditto.
(add_path_component): Ditto.
(add_one_import_path): Ditto.
(add_non_dialect_specific_path): Ditto.
(foreach_lib_gen_import_path): Ditto.
(get_module_source_dir): Ditto.
(add_default_include_paths): Ditto.
(assign_flibs): Ditto.
(m2_pathname_root): Ditto.
(add_m2_import_paths): Remove function.
(gm2_langhook_post_options): Call assign_flibs.
Check np.lib_root and call foreach_lib_gen_import_path.
Replace call to add_m2_import_paths with a call to
add_default_include_paths.
(gm2_langhook_handle_option): Add case OPT_fm2_pathname_rootI_.
* gm2spec.cc (named_path): Add field lib_root.
(push_back_Ipath): Set lib_root false.
(push_back_lib_root): New function.
(add_m2_I_path): Add OPT_fm2_pathname_rootI_ option
if np.lib_root.
(lang_specific_driver): Add case OPT_fm2_pathname_root_.
* lang.opt (fm2-pathname-root=): New option.
(fm2-pathname-rootI=): Ditto.

gcc/testsuite/ChangeLog:

PR modula2/121629
* gm2/switches/pathnameroot/pass/switches-pathnameroot-pass.exp: New test.
* gm2/switches/pathnameroot/pass/test.mod: New test.
* gm2/switches/pathnameroot/pass/testlib/m2/foo.def: New test.
* gm2/switches/pathnameroot/pass/testlib/m2/foo.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[gcn] gcc/configure.ac + install.texi - changes to detect HAVE_AS_LEB128 [PR119367]

The llvm-mc linker by default assemblies to another assembly file and not to an ELF
binary; that usually does not matter – but for the LEB128 check, additionally, the
resulting binary is checked. Hence, when using llvm-mc as target linker for
amdgcn-*-*, we better add the "--filetype=obj -triple=amdgcn--amdhsa" flags. The
current patch does so unconditionally, assuming that always llvm-mc is used.

Additionally, the resulting ELF file is checked, which requires an ELF reader such
as objdump. This commit adds llvm-objdump to the build documentation for amdgcn,
albeit also, e.g., Binutil's 'objdump' would do - as long as either
amdgcn-amdhsa-objdump or amdgcn-amdhsa/bin/objdump is found during the amdgcn
cross build.

gcc/ChangeLog:

PR debug/119367
* acinclude.m4 (gcc_GAS_FLAGS): For gcn, use "--filetype=obj
-triple=amdgcn--amdhsa", if supported.
* configure: Regenerate.
* doc/install.texi (amdgcn-*-*): Also add llvm-objdump to the list of
to-be-copied files.

c++: Fix auto return type deduction with expansion statements [PR121583]

The following testcase ICEs during expansion, because cfun->returns_struct
wasn't cleared, despite auto being deduced to int.

The problem is that check_return_type -> apply_deduced_return_type
is called when parsing the expansion stmt body, at that time
processing_template_decl is non-zero and apply_deduced_return_type
in that case doesn't do the
     if (function *fun = DECL_STRUCT_FUNCTION (fco))
       {
         bool aggr = aggregate_value_p (result, fco);
#ifdef PCC_STATIC_STRUCT_RETURN
         fun->returns_pcc_struct = aggr;
#endif
         fun->returns_struct = aggr;
       }
My assumption is that !processing_template_decl in that case
is used in the sense "the fco function is not a function template",
for function templates no reason to bother with fun->returns*struct,
nothing will care about that.
When returning a type dependent expression in the expansion stmt
body, apply_deduced_return_type just won't be called during parsing,
but when instantiating the body and all will be fine.  But when
returning a non-type-dependent expression, while check_return_type
will be called again during instantiation of the body, as the return
type is no longer auto in that case apply_deduced_return_type will not
be called again and so nothing will fix up fun->returns*struct.

The following patch fixes that by using !uses_template_parms (fco)
check instead of !processing_template_decl.

2025-08-28  Jakub Jelinek  <jakub@redhat.com>

PR c++/121583
* semantics.cc (apply_deduced_return_type): Adjust
fun->returns*_struct when !uses_template_parms (fco) instead of
when !processing_template_decl.

* g++.dg/cpp26/expansion-stmt23.C: New test.
* g++.dg/cpp26/expansion-stmt24.C: New test.

c++: Fix ICE with parameter uses in expansion stmts [PR121575]

The following testcase shows an ICE when a parameter of a non-template
function is referenced in expansion stmt body.

tsubst_expr in that case assumes that either the PARM_DECL has registered
local specialization, or is this argument or it is in unevaluated context.
Parameters are always defined outside of the expansion statement
for-range-declaration or body, so for the instantiation of the body
outside of templates should always map to themselves.
It could be fixed by registering local self-specializations for all the
function parameters, but just handling it in tsubst_expr seems to be easier
and less costly.
Some PARM_DECLs, e.g. from concepts, have NULL DECL_CONTEXT, those are
handled like before (and assert it is unevaluated operand), for others
this checks if the PARM_DECL is from a non-template and in that case it
will just return t.

2025-08-28 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR c++/121575
* pt.cc (tsubst_expr) <case PARM_DECL>: If DECL_CONTEXT (t) isn't a
template return t for PARM_DECLs without local specialization.

* g++.dg/cpp26/expansion-stmt20.C: New test.

Avoid mult pattern if that will break reduction constraints

synth-mult introduces multiple uses of a reduction variable
in some cases which will ultimatively fail vectorization (or ICE
with a pending change). So avoid applying the pattern in such case.

* tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid
in cases that introduce multiple uses of reduction operands.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

The divmod pattern will break reduction constraints

When we apply a divmod pattern this will break reductions by introducing
multiple uses of the reduction var, so avoid this pattern in reductions.

* tree-vect-patterns.cc (vect_recog_divmod_pattern): Avoid
for stmts participating in a reduction.

configure: Add readelf fallback for HAVE_AS_ULEB128 test [PR119367]

The following patch adds a readelf fallback if objdump nor otool don't
exist. All of GNU binutils readelf, eu-readelf and llvm-readelf can
handle it with those options.

2025-08-28 Jakub Jelinek <jakub@redhat.com>

PR debug/119367
* configure.ac (gcc_cv_as_leb128): Add fallback using readelf.
Grammar fix in comment.
* configure: Regenerate.

dwarf2out: Use DW_LNS_advance_pc instead of DW_LNS_fixed_advance_pc if possible [PR119367]

In the usual case we use .loc directives and don't emit the line table
manually.  And assembler usually uses DW_LNS_advance_pc which has
uleb128 argument and in most cases will have just a single byte operand.
But if we do emit it for whatever reason (old or buggy assembler or
-gno-as-loc{,view}-support option), we do use DW_LNS_fixed_advance_pc
instead, which has fixed 2 byte operand.  That is both wasteful
in the usual case of very small advances, and more importantly will
just result in assembler errors if we need to advance over more than 65535
bytes.
The following patch uses DW_LNS_advance_pc instead if assembler supports
.uleb128 directive with a difference of two labels in the same section.
This is only possible if Minimum Instruction Length in the .debug_line
header is 1 (otherwise DW_LNS_advance_pc operand is multiplied by that
value and DW_LNS_fixed_advance_pc is not), but we emit 1 for that
on all targets.
Looking at dwarf2out.o (from dwarf2out.cc with this patch)
compiled with compilers before/after this change with additional -fpic
-gno-as-loc{,view}-support options, I see .debug_line section shrunk from
878067 bytes to 773381 bytes, so shrink by 12%.
Admittedly gas generated .debug_line is even smaller, 501374 bytes (with
-fpic and without -gno-as-loc{,view}-support options).

2025-08-28  Jakub Jelinek  <jakub@redhat.com>

PR debug/119367
* dwarf2out.cc (output_one_line_info_table) <case LI_adv_address>: If
HAVE_AS_LEB128, use DW_LNS_advance_pc with dw2_asm_output_delta_uleb128
instead of DW_LNS_fixed_advance_pc with dw2_asm_output_delta.

Fortran: Constructors with PDT components did not work [PR82843]

2025-08-28 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/82843
* intrinsic.cc (gfc_convert_type_warn): If the 'from_ts' is a
PDT instance, copy the derived type to the target ts.
* resolve.cc (gfc_resolve_ref): A PDT component in a component
reference can be that of the pdt_template. Unconditionally use
component of the PDT instance to ensure that the backend_decl
is set during translation. Likewise if a component is
encountered that is a PDT template type, use the component
parmeters to convert to the correct PDT instance.

gcc/testsuite/
PR fortran/82843
* gfortran.dg/pdt_40.f03: New test.

Fortran: Implement correct form of PDT constructors [PR82205]

2025-08-28 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/82205
* decl.cc (gfc_get_pdt_instance): Copy the default initializer
for components that are not PDT parameters or parameterized. If
any component is a pointer or allocatable set the attributes
'pointer_comp' or 'alloc_comp' of the new PDT instance.
* primary.cc (gfc_match_rvalue): Implement the correct form of
PDT constructors with 'name (type parms)(component values)'.
* trans-array.cc (structure_alloc_comps): Apply scalar default
initializers. Array initializers await the coming change in PDT
representation.
* trans-io.cc (transfer_expr): Do not output the type parms of
a PDT in list directed output.

gcc/testsuite/
PR fortran/82205
* gfortran.dg/pdt_22.f03: Use the correct for PDT constructors.
* gfortran.dg/pdt_23.f03: Likewise.
* gfortran.dg/pdt_3.f03: Likewise.

Daily bump.

Remove xfail marker on RISC-V test

So yet another testsuite hygiene patch. This time turning XPASS -> PASS. My
tester treats those cases the same so I didn't get notified that nozicond-2.c
was passing after some recent changes.

This removes the xfail marker on that test and thus the test is expected to
pass now.

Pushing to the trunk momentarily.

gcc/testsuite/
* gcc.target/riscv/nozicond-2.c: Remove xfails.

Fortran: H edit descriptor error with -std=f95

PR fortran/114611

gcc/fortran/ChangeLog:

* io.cc: Issue an error on use of the H descriptor in
a format with -std=f95 or higher. Otherwise, issue a
warning.

gcc/testsuite/ChangeLog:

* gfortran.dg/aliasing_dummy_1.f90: Accommodate errors
and warnings as needed.
* gfortran.dg/eoshift_8.f90: Likewise.
* gfortran.dg/g77/f77-edit-h-out.f: Likewise.
* gfortran.dg/hollerith_1.f90: Likewise.
* gfortran.dg/io_constraints_1.f90: Likewise.
* gfortran.dg/io_constraints_2.f90: Likewise.
* gfortran.dg/longline.f: Likewise.
* gfortran.dg/pr20086.f90: Likewise.
* gfortran.dg/unused_artificial_dummies_1.f90: Likewise.
* gfortran.dg/x_slash_1.f: Likewise.

ifcvt: fix factor_out_operators (again) [PR121695]

r16-2648-gaebbc90d8c7c70 had a copy and pasto where
the second statement was supposed to be setting
the operand 1 of the phi but it was setting operand 0 instead.
This fixes typo.

Push as obvious after a quick build test for x86_64-linux-gnu.

PR tree-optimization/121695

gcc/ChangeLog:

* tree-if-conv.cc (factor_out_operators): Fix typo
in assignment of the phi.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr121695-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

RISC-V: testsuite: Fix vf_vfmul and vf_vfrdiv

Fix type and remove useless DejaGnu directives.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: Fix type.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: Remove
useless dg directives.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: Likewise.

libstdc++: Use _M_reverse to reverse partial_ordering using operator<=>

The patch r16-3414-gfcb3009a32dc33 changed the representation of unordered to
optimize reversing of order, but it did not update implementation of reversing
operator<=>(0, partial_order).

libstdc++-v3/ChangeLog:

* libsupc++/compare
(operator<=>(__cmp_cat::__unspec, partial_ordering)):
Implement using _M_reverse.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Move tai_- and gps_clock::now impls out of ABI

This patch moves std::tai_clock::now() and std::tai_clock::now()
definitions from header inlines to static members invoked via a
normal function call, in service of stabilizing the C++20 ABI.

It also changes #if guards to mention the actual __cpp_lib_*
feature gated, not just the language version, for clarity.

New global function symbols std::chrono::tai_clock::now
and std::chrono::gps_clock::now are exported.

libstdc++-v3/ChangeLog:
* include/std/chrono (gps_clock::now, tai_clock::now): Remove
inline definitions.
* src/c++20/clock.cc (gps_clock::now, tai_clock::now): New file
for out-of-line now() impls.
* src/c++20/Makefile.am: Mention clock.cc.
* src/c++20/Makefile.in: Regenerate.
* config/abi/pre/gnu.ver: add mangled now() symbols.

Remove dead code

The following removes trivially dead code.

* tree-vect-loop.cc (vect_transform_cycle_phi): Remove
unused reduc_stmt_info.

libsupc++: Change _Unordered comparison value to minimum value of signed char.

For any minimum value of a signed type, its negation (with wraparound) results
in the same value, behaving like zero. Representing the unordered result with
this minimum value, along with 0 for equal, 1 for greater, and -1 for less
in partial_ordering, allows its value to be reversed using unary negation.

The operator<=(partial_order, 0) now checks if the reversed value is positive.
This works correctly because the unordered value remains unchanged and thus
negative.

libstdc++-v3/ChangeLog:

* libsupc++/compare (_Ncmp::_Unordered): Rename and change the value
to minimum value of signed char.
(_Ncomp::unordered): Renamed from _Unordered, the name is reserved
by partial_ordered::unordered.
(partial_ordering::_M_reverse()): Define.
(operator<=(partial_ordering, __cmp_cat::__unspec))
(operator>=(__cmp_cat::__unspec, partial_ordering)): Implemented
in terms of negated _M_value.
(operator>=(partial_ordering, __cmp_cat::__unspec))
(operator<=(__cmp_cat::__unspec, partial_ordering)): Directly
compare _M_value, as unordered value is negative.
(partial_ordering::unordered): Handle _Ncmp::unoredred rename.
* python/libstdcxx/v6/printers.py: Add -128 as integer value
for unordered, keeping 2 to preserve backward compatibility.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++: Fix up cpp_warn on __STDCPP_FLOAT*_T__ [PR121520]

I got the cpp_warn on __STDCPP_FLOAT*_T__ if we aren't predefining those
wrong, so e.g. on powerpc64le we don't diagnose #undef __STDCPP_FLOAT16_T__.
I've added it as else if on the
if (c_dialect_cxx () && cxx_dialect > cxx20 && !floatn_nx_types[i].extended)
condition, which means cpp_warn is called in case a target supports some
extended type like _Float32x, cpp_warn is called on __STDCPP_FLOAT32_T__
(where when it supported _Float32 as well it did cpp_define_warn
(pfile, "__STDCPP_FLOAT32_T__=1") earlier).
On targets where the types aren't supported the earlier
if (FLOATN_NX_TYPE_NODE (i) == NULL_TREE) continue;
path is taken.

This patch fixes it to cpp_warn on the non-extended types for C++23
if the target doesn't support them and cpp_define_warn as before if it does.

2025-08-27 Jakub Jelinek <jakub@redhat.com>

PR target/121520
* c-cppbuiltin.cc (c_cpp_builtins): Properly call cpp_warn
for __STDCPP_FLOAT<NN>_T__ if FLOATN_NX_TYPE_NODE (i) is NULL
for C++23 for non-extended types and don't call cpp_warn for
extended types.

tree-optimization/121686 - failed SLP discovery for live recurrence

The following adjusts the SLP build for only-live stmts to not
only consider vect_induction_def and vect_internal_def that are
not part of a reduction but instead consider all non-reduction
defs that are not part of a reduction, specifically in this case
a recurrence def. This is also a missed optimization on the
gcc-15 branch (but IMO a very minor one).

PR tree-optimization/121686
* tree-vect-slp.cc (vect_analyze_slp): Consider all only-live
non-reduction defs for discovery.

* gcc.dg/vect/pr121686.c: New testcase.

testsuite; Fix unprotected-allocas-1.c at -O3 [PR121684]

The problem here is after r16-101, the 2 functions containing alloca/VLA
start to be cloned and then we un-VLA happens in using_vararray so this
is no longer testing what it should be testing.
The obvious fix is to mark using_vararray and using_alloca as noclone too.

Pushed as obvious after a quick test to make sure it is now working.

gcc/testsuite/ChangeLog:

PR testsuite/121684
* c-c++-common/hwasan/unprotected-allocas-0.c: Mark
using_vararray and using_alloca as noclone too.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

libstdc++: Reduce chances of object aliasing for function wrapper.

Previously, an empty functor (EmptyIdFunc) stored inside a
std::move_only_function being first member of a Composite class could have the
same address as the base of the EmptyIdFunc type (see included test cases),
resulting in two objects of the same type at the same address.

This commit addresses the issue by moving the internal buffer from the start
of the wrapper object to a position after the manager function pointer. This
minimizes aliasing with the stored buffer but doesn't completely eliminate it,
especially when multiple empty base objects are involved (PR121180).

To facilitate this member reordering, the private section of _Mo_base was
eliminated, and the corresponding _M_manager and _M_destroy members were made
protected. They remain inaccessible to users, as user-facing wrappers derive
from _Mo_base privately.

libstdc++-v3/ChangeLog:

* include/bits/funcwrap.h (__polyfunc::_Mo_base): Reorder _M_manage
and _M_storage members. Make _M_destroy protected and remove friend
declaration.
* testsuite/20_util/copyable_function/call.cc: Add test for aliasing
base class.
* testsuite/20_util/move_only_function/call.cc: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

x86-64: Emit the TLS call after debug marker

For a basic block with only a debug marker:

(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(debug_insn 5 2 16 2 (debug_marker) "x.c":6:3 -1 (nil))

emit the TLS call after debug marker.

gcc/

PR target/121668
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after debug marker.

gcc/testsuite/

PR target/121668
* gcc.target/i386/pr121668-1a.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Move pr121656.c to gcc.dg/torture

Move pr121656.c to gcc.dg/torture and replace weak attribute with noipa
attribute.  Verified by reverting

56ca14c4c4f Fix invalid right shift count with recent ifcvt changes

to trigger

FAIL: gcc.dg/torture/pr121656.c   -O1  execution test
FAIL: gcc.dg/torture/pr121656.c   -O2  execution test
FAIL: gcc.dg/torture/pr121656.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr121656.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test

on Linux/x86-64.

PR tree-optimization/121656
* gcc.dg/pr121656.c: Moved to ...
* gcc.dg/torture/pr121656.c: Here.
(dg-options): Removed.
(foo): Replace weak attribute with noipa attribute.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

More RISC-V testsuite hygiene

More testsuite hygiene.  Some of the thead tests are expecting to find
xtheadvdot in the extension set, but it's not defined as a valid extension
anywhere.  I'm just removing xtheadvdot.  Someone more familiar with these
cores can add it back properly if they're so inclined.

Second, there's a space after the zifencei in a couple of the thead arch
strings.  Naturally that causes failures as well.  That's a trivial fix, just
remove the bogus whitespace.

That gets us clean on riscv.exp on the pioneer system.

The pioneer is happy, as is riscv32-elf and riscv64-elf.  Pushing to the trunk.

gcc/
* config/riscv/riscv-cores.def (xt-c908v): Drop xtheadvdot.
(xt-c910v2): Remove extraenous whitespace.
(xt-c920v2): Drop xtheadvdot and remove extraeonous whitespace.

gcc/testsuite/

* gcc.target/riscv/mcpu-xt-c908v.c: Drop xtheadvdot.
* gcc.target/riscv/mcpu-xt-c920v2.c: Drop xtheadvdot.

Daily bump.

OpenMP: give error when variant is the same as the base function [PR118839]

As noted in the issue, the C++ front end has deeper problems: it's
supposed to do the name lookup of the variant at the call site but is
instead doing it when parsing the "declare variant" construct, before
registering the decl for the base function. The C++ part of the
patch is a band-aid to catch the case where there is a previous declaration
of the function and it doesn't give an undefined symbol error instead.
Some real solution ought to be included as part of fixing PR118791.

gcc/c/
PR middle-end/118839
* c-parser.cc (c_finish_omp_declare_variant): Error if variant
is the same as base.

gcc/cp/
PR middle-end/118839
* decl.cc (omp_declare_variant_finalize_one): Error if variant
is the same as base.

gcc/fortran/
PR middle-end/118839
* trans-openmp.cc (gfc_trans_omp_declare_variant): Error if variant
is the same as base.

gcc/testsuite/
PR middle-end/118839
* gcc.dg/gomp/declare-variant-3.c: New.
* gfortran.dg/gomp/declare-variant-22.f90: New.

OpenMP: Improve front-end error-checking for "declare variant"

This patch fixes a number of problems with parser error checking of
"declare variant", especially in the C front end.

The new C testcase unprototyped-variant.c added by this patch used to
ICE when gimplifying the call site, at least in part because the
variant was being recorded even after it was diagnosed as invalid.
There was also a large block of dead code in the C front end that was
supposed to fix up an unprototyped declaration of a variant function
to match the base function declaration, that was never executed because
it was nested in a conditional that could never be true.  I've fixed those
problems by rearranging the code and only recording the variant if it
passes the correctness checks.  I also tried to add some comments and
re-work some particularly confusing bits of code, so that it's easier to
understand.

The OpenMP specification doesn't say what the behavior of "declare
variant" with the "append_args" clause should be when the base
function is unprototyped.  The additional arguments are supposed to be
inserted between the last fixed argument of the base function and any
varargs, but without a prototype, for any given call we have no idea
which arguments are fixed and which are varargs, and therefore no idea
where to insert the additional arguments.  This used to trigger some
other diagnostics (which one depending on whether the variant was also
unprototyped), but I thought it was better to just reject this with an
explicit "sorry".

Finally, I also observed that a missing "match" clause was only
rejected if "append_args" or "adjust_args" was present.  Per the spec,
"match" has the "required" property, so if it's missing it should be
diagnosed unconditionally.  The C++ and Fortran front ends had the same
issue so I fixed this one there too.

gcc/c/ChangeLog
* c-parser.cc (c_finish_omp_declare_variant): Rework diagnostic
code.  Do not record variant if there are errors.  Make check for
a missing "match" clause unconditional.

gcc/cp/ChangeLog
* parser.cc (cp_finish_omp_declare_variant): Structure diagnostic
code similarly to C front end.  Make check for a missing "match"
clause unconditional.

gcc/fortran/ChangeLog
* openmp.cc (gfc_match_omp_declare_variant): Make check for a
missing "match" clause unconditional.

gcc/testsuite/ChangeLog
* c-c++-common/gomp/append-args-1.c: Adjust expected output.
* g++.dg/gomp/adjust-args-1.C: Likewise.
* g++.dg/gomp/adjust-args-3.C: Likewise.
* gcc.dg/gomp/adjust-args-1.c: Likewise:
* gcc.dg/gomp/append-args-1.c: Likewise.
* gcc.dg/gomp/unprototyped-variant.c: New.
* gfortran.dg/gomp/adjust-args-1.f90: Adjust expected output.
* gfortran.dg/gomp/append_args-1.f90: Likewise.

[committed] RISC-V Testsuite hygiene

Shreya and I were working through some testsuite failures and noticed that many
of the current failures on the pioneer were just silly.  We have tests that
expect to see full architecture strings in their expected output when the bulk
(some might say all) of the architecture string is irrelevant.

Worse yet, we'd have different matching lines.  ie we'd have one that would
machine rv64gc_blah_blah and another for rv64imfa_blah_blah.  Judicious
wildcard usage cleans this up considerably.

This fixes ~80 failures in the riscv.exp testsuite. Pushing to the trunk as
it's happy on the pioneer native, riscv32-elf and riscv64-elf.

gcc/testsuite/
* gcc.target/riscv/arch-25.c: Use wildcards to simplify/eliminate
dg-error directives.
* gcc.target/riscv/arch-ss-2.c: Similarly.
* gcc.target/riscv/arch-zilsd-2.c: Similarly.
* gcc.target/riscv/arch-zilsd-3.c: Similarly.

libstdc++/ranges: Prefer using offset-based _CachedPosition

The offset-based partial specialization of _CachedPosition for
random-access iterators is currently only selected if the offset type is
smaller than the iterator type.  Before r12-1018-g46ed811bcb4b86 this
made sense since the main partial specialization only stored the
iterator (incorrectly).  After that bugfix, the main partial
specialization now effectively stores a std::optional<iter> so the
size constraint is inaccurate.  And this main partial specialization
must invalidate itself upon copy/move unlike the offset-based partial
specialization.  So I think we should just always prefer the
offset-based _CachedPosition for a random-access iterator, even if the
offset type happens to be larger than the iterator type.

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::_CachedPosition): Remove
additional size constraint on the offset-based partial
specialization.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

testsuite: restrict ctf-array-7 test to 64-bit targets [PR121411]

The test fails to compile on 32-bit targets because the arrays are too
large. Restrict to targets where the array index type is 64-bits.
Also note the relevant PR in the test comment.

PR debug/121411

gcc/testsuite/

* gcc.dg/debug/ctf/ctf-array-7.c: Restrict to lp64,llp64
targets.

testsuite: arm: Disable sched2 and sched3 in unsigned-extend-2.c

Disable sched2 and sched3 to only have one order of instructions to
consider.

gcc/testsuite/ChangeLog:

* gcc.target/arm/unsigned-extend-2.c: Disable sched2 and sched3
and update function body to match.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

libstdc++: Do not require assignment for vector::resize(n, v) [PR90192]

This patch introduces a new function, _M_fill_append, which is invoked when
copies of the same value are appended to the end of a vector. Unlike
_M_fill_insert(end(), n, v), _M_fill_append never permute elements in place,
so it does not require:
* vector element type to be assignable;
* a copy of the inserted value, in the case where it points to an
element of the vector.

vector::resize(n, v) now uses _M_fill_append, fixing the non-conformance where
element types were required to be assignable.

In addition, _M_fill_insert(end(), n, v) now delegates to _M_fill_append, which
eliminates an unnecessary copy of v when the existing capacity is used.

PR libstdc++/90192

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (vector<T>::_M_fill_append): Declare.
(vector<T>::fill): Use _M_fill_append instead of _M_fill_insert.
* include/bits/vector.tcc (vector<T>::_M_fill_append): Define
(vector<T>::_M_fill_insert): Delegate to _M_fill_append when
elements are appended.
* testsuite/23_containers/vector/modifiers/moveable.cc: Updated
copycount for inserting at the end (appending).
* testsuite/23_containers/vector/modifiers/resize.cc: New test.
* testsuite/backward/hash_set/check_construct_destroy.cc: Updated
copycount, the hash_set constructor uses insert to fill buckets
with nullptrs.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Refactor bound arguments storage for bind_front/back

This patch refactors the implementation of bind_front and bind_back to avoid
using std::tuple for argument storage. Instead, bound arguments are now:
* stored directly if there is only one,
* within a dedicated _Bound_arg_storage otherwise.

_Bound_arg_storage is less expensive to instantiate and access than std::tuple.
It can also be trivially copyable, as it doesn't require a non-trivial assignment
operator for reference types. Storing a single argument directly provides similar
benefits compared to both one element tuple or _Bound_arg_storage.

_Bound_arg_storage holds each argument in an _Indexed_bound_arg base object.
The base class is parameterized by both type and index to allow storing
multiple arguments of the same type. Invocations are handled by _S_apply_front
amd _S_apply_back static functions, which simulate explicit object parameters.
To facilitate this, the __like_t alias template is now unconditionally available
since C++11 in bits/move.h.

libstdc++-v3/ChangeLog:

* include/bits/move.h (std::__like_impl, std::__like_t): Make
available in c++11.
* include/std/functional (std::_Indexed_bound_arg)
(std::_Bound_arg_storage, std::__make_bound_args): Define.
(std::_Bind_front, std::_Bind_back): Use _Bound_arg_storage.
* testsuite/20_util/function_objects/bind_back/1.cc: Expand
test to cover cases of 0, 1, many bound args.
* testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.
* testsuite/20_util/function_objects/bind_front/1.cc: Likewise.
* testsuite/20_util/function_objects/bind_front/111327.cc: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Specialize _Never_valueless_alt for jthread, stop_token and stop_source

The move constructors for stop_source and stop_token are equivalent to
copying and clearing the raw pointer, as they are wrappers for a
counted-shared state.

For jthread, the move constructor performs a member-wise move of stop_source
and thread. While std::thread could also have a _Never_valueless_alt
specialization due to its inexpensive move (only moving a handle), doing
so now would change the ABI. This patch takes the opportunity to correct
this behavior for jthread, before C++20 API is marked stable.

libstdc++-v3/ChangeLog:

* include/std/stop_token (__variant::_Never_valueless_alt): Declare.
(__variant::_Never_valueless_alt<std::stop_token>)
(__variant::_Never_valueless_alt<std::stop_source>): Define.
* include/std/thread: (__variant::_Never_valueless_alt): Declare.
(__variant::_Never_valueless_alt<std::jthread>): Define.

Enable unroll in the vectorizer when there's reduction for FMA/DOT_PROD_EXPR/SAD_EXPR

The patch is trying to unroll the vectorized loop when there're
FMA/DOT_PRDO_EXPR/SAD_EXPR reductions, it will break cross-iteration dependence
and enable more parallelism(since vectorize will also enable partial
sum).

When there's gather/scatter or scalarization in the loop, don't do the
unroll since the performance bottleneck is not at the reduction.

The unroll factor is set according to FMA/DOT_PROX_EXPR/SAD_EXPR
CEIL ((latency * throught), num_of_reduction)
.i.e
For fma, latency is 4, throught is 2, if there's 1 FMA for reduction
then unroll factor is 2 * 4 / 1 = 8.

There's also a vect_unroll_limit, the final suggested_unroll_factor is
set as MIN (vect_unroll_limix, 8).

The vect_unroll_limit is mainly for register pressure, avoid to many
spills.
Ideally, all instructions in the vectorized loop should be used to
determine unroll_factor with their (latency * throughput) / number,
but that would too much for this patch, and may just GIGO, so the
patch only considers 3 kinds of instructions: FMA, DOT_PROD_EXPR,
SAD_EXPR.

Note when DOT_PROD_EXPR is not native support,
m_num_reduction += 3 * count which almost prevents unroll.

There's performance boost for simple benchmark with DOT_PRDO_EXPR/FMA
chain, slight improvement in SPEC2017 performance.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
Addd new memeber m_num_reduc, m_prefer_unroll.
(ix86_vector_costs::add_stmt_cost): Set m_prefer_unroll and
m_num_reduc
(ix86_vector_costs::finish_cost): Determine
m_suggested_unroll_vector with consideration of
reduc_lat_mult_thr, m_num_reduction and
ix86_vect_unroll_limit.
* config/i386/i386.h (enum ix86_reduc_unroll_factor): New
enum.
(processor_costs): Add reduc_lat_mult_thr and
vect_unroll_limit.
* config/i386/x86-tune-costs.h: Initialize
reduc_lat_mult_thr and vect_unroll_limit.
* config/i386/i386.opt: Add -param=ix86-vect-unroll-limit.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vect_unroll-1.c: New test.
* gcc.target/i386/vect_unroll-2.c: New test.
* gcc.target/i386/vect_unroll-3.c: New test.
* gcc.target/i386/vect_unroll-4.c: New test.
* gcc.target/i386/vect_unroll-5.c: New test.

[PATCH] RISC-V: Add pattern for reverse floating-point divide

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a div RTL instruction. The vec_duplicate is the
dividend operand.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfdiv.vv       v1,v2,v1

After, we get only one:
  vfrdiv.vf      v1,v1,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfrdiv_vf_<mode>): Add new pattern to
combine vec_duplicate + vfdiv.vv into vfrdiv.vf.
* config/riscv/vector.md (@pred_<optab><mode>_reverse_scalar): Allow VLS
modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrdiv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for reverse
variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
reverse variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: New test.

AArch64: extend cost model to cost outer loop vect where the inner loop is invariant [PR121290]

Consider the example:

void
f (int *restrict x, int *restrict y, int *restrict z, int n)
{
  for (int i = 0; i < 4; ++i)
    {
      int res = 0;
      for (int j = 0; j < 100; ++j)
        res += y[j] * z[i];
      x[i] = res;
    }
}

we currently vectorize as

f:
        movi    v30.4s, 0
        ldr     q31, [x2]
        add     x2, x1, 400
.L2:
        ld1r    {v29.4s}, [x1], 4
        mla     v30.4s, v29.4s, v31.4s
        cmp     x2, x1
        bne     .L2
        str     q30, [x0]
        ret

which is not useful because by doing outer-loop vectorization we're performing
less work per iteration than we would had we done inner-loop vectorization and
simply unrolled the inner loop.

This patch teaches the cost model that if all your leafs are invariant, then
adjust the loop cost by * VF, since every vector iteration has at least one lane
really just doing 1 scalar.

There are a couple of ways we could have solved this, one is to increase the
unroll factor to process more iterations of the inner loop.  This removes the
need for the broadcast, however we don't support unrolling the inner loop within
the outer loop.  We only support unrolling by increasing the VF, which would
affect the outer loop as well as the inner loop.

We also don't directly support costing inner-loop vs outer-loop vectorization,
and as such we're left trying to predict/steer the cost model ahead of time to
what we think should be profitable.  This patch attempts to do so using a
heuristic which penalizes the outer-loop vectorization.

We now cost the loop as

note:  Cost model analysis:
  Vector inside of loop cost: 2000
  Vector prologue cost: 4
  Vector epilogue cost: 0
  Scalar iteration cost: 300
  Scalar outside cost: 0
  Vector outside cost: 4
  prologue iterations: 0
  epilogue iterations: 0
missed:  cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4.
missed:  not vectorized: vectorization not profitable.
missed:  not vectorized: vector version will never be profitable.
missed:  Loop costings may not be worthwhile.

And subsequently generate:

.L5:
        add     w4, w4, w7
        ld1w    z24.s, p6/z, [x0, #1, mul vl]
        ld1w    z23.s, p6/z, [x0, #2, mul vl]
        ld1w    z22.s, p6/z, [x0, #3, mul vl]
        ld1w    z29.s, p6/z, [x0]
        mla     z26.s, p6/m, z24.s, z30.s
        add     x0, x0, x8
        mla     z27.s, p6/m, z23.s, z30.s
        mla     z28.s, p6/m, z22.s, z30.s
        mla     z25.s, p6/m, z29.s, z30.s
        cmp     w4, w6
        bls     .L5

and avoids the load and replicate if it knows it has enough vector pipes to do
so.

gcc/ChangeLog:

PR target/121290
* config/aarch64/aarch64.cc
(class aarch64_vector_costs ): Add m_loop_fully_scalar_dup.
(aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops.
(adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup.

gcc/testsuite/ChangeLog:

PR target/121290
* gcc.target/aarch64/pr121290.c: New test.

[PATCH] RISC-V: Add pattern for vector-scalar single-width floating-point multiply

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a mult RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmul.vv       v1,v1,v2

After, we get only one:
  vfmul.vf       v2,v2,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfmul_vf_<mode>): Add new pattern to
combine vec_duplicate + vfmul.vv into vfmul.vf.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: Likewise.

Fix RISC-V bootstrap

Recent changes from Kito have an unused parameter. On the assumption that he's
going to likely want it as part of the API, I've simply removed the parameter's
name until such time as Kito needs it.

This should restore bootstrapping to the RISC-V port. Committing now rather
than waiting for the CI system given bootstrap builds currently fail.

* config/riscv/riscv.cc (riscv_arg_partial_bytes): Remove name
from unused parameter.

arm: testsuite: make gcc.target/arm/bics_3.c generate bics again

The compiler is getting too smart! But this test is really intended
to test that we generate BICS instead of BIC+CMP, so make the test use
something that we can't subsequently fold away into a bit minipulation
of a store-flag value.

I've also added a couple of extra tests, so we now cover both the
cases where we fold the result away and where that cannot be done.
Also add a test that we don't generate a compare against 0, since
that's really part of what this test is covering.

gcc/testsuite:

* gcc.target/arm/bics_3.c: Add some additional tests that
cannot be folded to a bit manipulation.

Compute vect_reduc_type off SLP node instead of stmt-info

The following changes the vect_reduc_type API to work on the SLP node.
The API is only used from the aarch64 backend, so all changes are there.
In particular I noticed aarch64_force_single_cycle is invoked even
for scalar costing (where the flag tested isn't computed yet), I
figured in scalar costing all reductions are a single cycle.

* tree-vectorizer.h (vect_reduc_type): Get SLP node as argument.
* config/aarch64/aarch64.cc (aarch64_sve_in_loop_reduction_latency):
Take SLO node as argument and adjust.
(aarch64_in_loop_reduction_latency): Likewise.
(aarch64_detect_vector_stmt_subtype): Adjust.
(aarch64_vector_costs::count_ops): Likewise. Treat reductions
during scalar costing as single-cycle.

tree-optimization/121659 - bogus swap of reduction operands

The following addresses a bogus swapping of SLP operands of a
reduction operation which gets STMT_VINFO_REDUC_IDX out of sync
with the SLP operand order.  In fact the most obvious mistake is
that we simply swap operands even on the first stmt even when
there's no difference in the comparison operators (for == and !=
at least).  But there are more latent issues that I noticed and
fixed up in the process.

PR tree-optimization/121659
* tree-vect-slp.cc (vect_build_slp_tree_1): Do not allow
matching up comparison operators by swapping if that would
disturb STMT_VINFO_REDUC_IDX.  Make sure to only
actually mark operands for swapping when there was a
mismatch and we're not processing the first stmt.

* gcc.dg/vect/pr121659.c: New testcase.

Fix UBSAN issue with load-store data refactoring

The following makes sure to read from the lanes_ifn member only
when necessary (and thus it was set).

* tree-vect-stmts.cc (vectorizable_store): Access lanes_ifn
only when VMAT_LOAD_STORE_LANES.
(vectorizable_load): Likewise.

Remove STMT_VINFO_REDUC_VECTYPE_IN

This was added when invariants/externals outside of SLP didn't have
an easily accessible vector type. Now it's redundant so the
following removes it.

* tree-vectorizer.h (stmt_vec_info_::reduc_vectype_in): Remove.
(STMT_VINFO_REDUC_VECTYPE_IN): Likewise.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Get
at the input vectype via the SLP node child.
(vectorizable_lane_reducing): Likewise.
(vect_transform_reduction): Likewise.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN.

i386: Fix up recent changes to use GFNI for rotates/shifts [PR121658]

The vgf2p8affineqb_<mode><mask_name> pattern uses "register_operand"
predicate for the first input operand, so using "general_operand"
for the rotate operand passed to it leads to ICEs, and so does
the "nonimmediate_operand" in the <insn>v16qi3 define_expand.
The following patch fixes it by using "register_operand" in the former
case (that pattern is TARGET_GFNI only) and using force_reg in
the latter case (the pattern is TARGET_XOP || TARGET_GFNI and for XOP
we can handle MEM operand).

The rest of the changes are small formatting tweaks or use of const0_rtx
instead of GEN_INT (0).

2025-08-26  Jakub Jelinek  <jakub@redhat.com>

PR target/121658
* config/i386/sse.md (<insn><mode>3 any_shift): Use const0_rtx
instead of GEN_INT (0).
(cond_<insn><mode> any_shift): Likewise.  Formatting fix.
(<insn><mode>3 any_rotate): Use register_operand predicate instead of
general_operand for match_operand 1.  Use const0_rtx instead of
GEN_INT (0).
(<insn>v16qi3 any_rotate): Use force_reg on operands[1].  Formatting
fix.
* config/i386/i386.cc (ix86_shift_rotate_cost): Comment formatting
fixes.

* gcc.target/i386/pr121658.c: New test.

Daily bump.

RISC-V: Add test for vec_duplicate + vmacc.vv unsigned combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmacc.vv signed combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmacc.vv to the
vmacc.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_TERNARY_CASE_0(T, OP_1, OP_2, NAME)                        \
  void                                                                      \
  test_vx_ternary_##NAME##_##T##_case_0 (T * restrict vd, T * restrict vs2, \
                                         T rs1, unsigned n)                 \
  {                                                                         \
    for (unsigned i = 0; i < n; i++)                                        \
      vd[i] = vd[i] OP_2 vs2[i] OP_1 rs1;                                   \
  }

  DEF_VX_TERNARY_CASE_0(int32_t, *, +, macc)

Before this patch:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  ...
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  22   │     vmacc.vv v1,v2,v3
  ...
  25   │     bne a3,zero,.L3

After this patch:
  11   │     beq a3,zero,.L8
  ...
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  20   │     vmacc.vx v1,a2,v3
  ...
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Add new pattern to
generate vmacc rtl.
(*pred_macc_<mode>_scalar_undef): Ditto.
* config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Add new
pattern to match the vmacc vx combine.

Signed-off-by: Pan Li <pan2.li@intel.com>

omp-expand: Initialize fd->loop.n2 if needed for the zero iter case [PR121453]

When expand_omp_for_init_counts is called from expand_omp_for_generic,
zero_iter1_bb is NULL and the code always creates a new bb in which it
clears fd->loop.n2 var (if it is a var), because it can dominate code
with lastprivate guards that use the var.
When called from other places, zero_iter1_bb is non-NULL and so we don't
insert the clearing (and can't, because the same bb is used also for the
non-zero iterations exit and in that case we need to preserve the iteration
count).  Clearing is also not necessary when e.g. outermost collapsed
loop has constant non-zero number of iterations, in that case we initialize the
var to something already earlier.  The following patch makes sure to clear
it if it hasn't been initialized yet before the first check for zero iterations.

2025-08-26  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/121453
* omp-expand.cc (expand_omp_for_init_counts): Clear fd->loop.n2
before first zero count check if zero_iter1_bb is non-NULL upon
entry and fd->loop.n2 has not been written yet.

* gcc.dg/gomp/pr121453.c: New test.

Add a test for PR tree-optimization/121656

PR tree-optimization/121656
* gcc.dg/pr121656.c: New file.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

ctf: avoid overflow for array num elements [PR121411]

CTF array encoding uses uint32 for number of elements.  This means there
is a hard upper limit on array types which the format can represent.

GCC internally was also using a uint32_t for this, which would overflow
when translating from DWARF for arrays with more than UINT32_MAX
elements.  Use an unsigned HOST_WIDE_INT instead to fetch the array
bound, and fall back to CTF_K_UNKNOWN if the array cannot be
represented in CTF.

PR debug/121411

gcc/

* dwarf2ctf.cc (gen_ctf_subrange_type): Use unsigned HWI for
array_num_elements.  Fallback to CTF_K_UNKNOWN if the array
type has too many elements for CTF to represent.

gcc/testsuite/

* gcc.dg/debug/ctf/ctf-array-7.c: New test.

forwprop: Boolify simplify_permutation

After the return type of remove_prop_source_from_use was changed to void,
simplify_permutation only returns 1 or 0 so it can be boolified.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_permutation): Boolify.
(pass_forwprop::execute): No longer handle 2 as the return
from simplify_permutation.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Forwprop: boolify forward_propagate_into_comparison

After changing the return type of remove_prop_source_from_use,
forward_propagate_into_comparison will never return 2. So boolify
forward_propagate_into_comparison.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (forward_propagate_into_comparison): Boolify.
(pass_forwprop::execute): Don't handle return of 2 from
forward_propagate_into_comparison.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

forwprop: Remove return type of remove_prop_source_from_use

Since r5-4705-ga499aac5dfa5d9, remove_prop_source_from_use has always
return false. This removes the return type of remove_prop_source_from_use
and cleans up the usage of remove_prop_source_from_use.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (remove_prop_source_from_use): Remove
return type.
(forward_propagate_into_comparison): Update dealing with
no return type of remove_prop_source_from_use.
(forward_propagate_into_gimple_cond): Likewise.
(simplify_permutation): Likewise.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

forwprop: Mark the old switch index for (maybe) dceing

While looking at this code I noticed that we don't remove
the old switch index assignment if it is only used in the switch
after it is modified in simplify_gimple_switch.
This fixes that by marking the old switch index for the dce worklist.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_gimple_switch): Add simple_dce_worklist
argument. Mark the old index when doing the replacement.
(pass_forwprop::execute): Update call to simplify_gimple_switch.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Rewrite bool loads for undefined case [PR121279]

Just like r16-465-gf2bb7ffe84840d8 but this time
instead of a VCE there is a full on load from a boolean.
This showed up when trying to remove the extra copy
in the testcase from the revision mentioned above (pr120122-1.c).
So when moving loads from a boolean type from being conditional
to non-conditional, the load needs to become a full load and then
casted into a bool so that the upper bits are correct.

Bitfields loads will always do the truncation so they don't need to
be rewritten. Non boolean types always do the truncation too.

What we do is wrap the original reference with a VCE which causes
the full load and then do a casting to do the truncation. Using
fold_build1 with VCE will do the correct thing if there is a secondary
VCE and will also fold if this was just a plain MEM_REF so there is
no need to handle those 2 cases special either.

Changes since v1:
* v2: Use VIEW_CONVERT_EXPR instead of doing a manual load.
      Accept all non mode precision loads rather than just
      boolean ones.
* v3: Move back to checking boolean type. Don't handle BIT_FIELD_REF.
      Add asserts for IMAG/REAL_PART_EXPR.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/121279

gcc/ChangeLog:

* gimple-fold.cc (gimple_needing_rewrite_undefined): Return
true for non mode precision boolean loads.
(rewrite_to_defined_unconditional): Handle non mode precision loads.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr121279-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

LIM: Manually put uninit decl into ssa

When working on PR121279, I noticed that lim
would create an uninitialized decl and marking
it with supression for uninitialization warning.
This is fine but then into ssa would just call
get_or_create_ssa_default_def on that new decl which
could in theory take some extra compile time to figure
that out.
Plus when doing the rewriting for undefinedness, there
would now be a VCE around the decl. This means the ssa
name is kept around and not propagated in some cases.
So instead this patch manually calls get_or_create_ssa_default_def
to get the "uninitalized" ssa name for this decl and
no longer needs the write into ssa nor for undefined ness.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (execute_sm): Call
get_or_create_ssa_default_def for the new uninitialized
decl.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

xtensa: Make use of compact insn definition syntax for insns whose have multiple alternatives

The use of compact syntax makes the relationship between asm output,
operand constraints, and insn attributes easier to understand and modify,
especially for "mov<mode>_internal".

gcc/ChangeLog:

* config/xtensa/xtensa.md (addsi3, <u>mulhisi3, andsi3,
zero_extend<mode>si2, extendhisi2_internal, movsi_internal,
movhi_internal, movqi_internal, movsf_internal, ashlsi3_internal,
ashrsi3, lshrsi3, rotlsi3, rotrsi3):
Rewrite in compact syntax.

xtensa: Simplify "*masktrue_const_bitcmpl" insn pattern

gcc/ChangeLog:

* config/xtensa/xtensa.md
(The auxiliary define_split for *masktrue_const_bitcmpl):
Use a more concise function call, i.e.,
(1 << GET_MODE_BITSIZE (mode)) - 1 is equivalent to
GET_MODE_MASK (mode).

xtensa: Simplify "zero_extend[hq]isi2" insn patterns

gcc/ChangeLog:

* config/xtensa/xtensa.md (mode_bits):
New mode attribute.
(zero_extend<mode>si2): Use the appropriate mode iterator and
attribute to unify "zero_extend[hq]isi2" to this description.