git.ipfire.org Git - thirdparty/gcc.git/log

lto: Add an entry for cold attribute to lto_gnu_attributes

PR 118125 is a performance regression stemming from the fact that we
lose the cold attribute of our __builtin_unreachable.  The attribute
is simply and silently dropped on the floor by decl_attributes (in
attribs.cc) in the process of building decls for builtins because it
cannot look it up in the gnu attribute name space by
lookup_scoped_attribute_spec.  For that not to happen it must be in
lto_gnu_attributes and this patch adds it there.

In comment 13 of the bug Andrew identified other attributes which are
in builtin-attrs.def but missing in lto_gnu_attributes but apart from
cold it seems that they are either not used in builtins.def or are
used in DEF_LIB_BUILTIN which I guess might be less critical?
Eventually I decided to go for the most simple of patches and only add
things if they are requested.  For the same reason I also did not add
any checking to the attribute "handle" callback or any exclusion check.
They seem to be mostly relevant before LTO FE kicks in to me, but
again, I'm happy to add any if they seem to be useful.

Since Ian fixed PR 118746, the same issue has also been fixed in the
Go front-end and so I have added a simple checking assert to the
redirect_to_unreachable function to make sure it has the intended
effect.

gcc/ChangeLog:

2025-02-03  Martin Jambor  <mjambor@suse.cz>

PR lto/118125
* ipa-fnsummary.cc (redirect_to_unreachable): Add checking assert
that the builtin_unreachable decl has attribute cold.

gcc/lto/ChangeLog:

2025-02-03  Martin Jambor  <mjambor@suse.cz>

PR lto/118125
* lto-lang.cc (lto_gnu_attributes): Add an entry for cold attribute.
(handle_cold_attribute): New function.

c++: Reject cdtors and conversion operators with a single * as return type [PR118304, PR118306]

We currently accept the following constructor declaration (clang, EDG
and MSVC do as well), and ICE on the destructor declaration

=== cut here ===
struct A {
*A ();
~A () = default;
};
=== cut here ===

The problem is that we end up in grokdeclarator with a cp_declarator of
kind cdk_pointer but no type, and we happily go through (if we have a
reference instead we eventually error out trying to form a reference to
void).

This patch makes sure that grokdeclarator errors out and strips the
invalid declarator when processing a cdtor (or a conversion operator
with no return type specified) with a declarator representing a pointer
or a reference type.

PR c++/118306
PR c++/118304

gcc/cp/ChangeLog:

* decl.cc (maybe_strip_indirect_ref): New.
(check_special_function_return_type): Take declarator as input.
Call maybe_strip_indirect_ref and error out if it returns true.
(grokdeclarator): Update call to
check_special_function_return_type.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.jason/operator.C: Adjust bogus test
expectation (char** vs char*).
* g++.dg/parse/constructor4.C: New test.
* g++.dg/parse/constructor5.C: New test.
* g++.dg/parse/conv_op2.C: New test.
* g++.dg/parse/default_to_int.C: New test.

sarif-replay: fix off-by-one in handling of "endColumn" (§3.30.8) [PR118792]

gcc/ChangeLog:
PR sarif-replay/118792
* libsarifreplay.cc (sarif_replayer::handle_region_object): Fix
off-by-one in handling of endColumn property so that the code
matches the comment and the SARIF spec (§3.30.8).

gcc/testsuite/ChangeLog:
PR sarif-replay/118792
* sarif-replay.dg/2.1.0-valid/error-with-note.sarif: Update
expected output to reflect fix to off-by-one error in handling of
"endColumn" property.
* sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Synchronize include/dwarf2.def with binutils

The contents of include/dwarf2.def have diverged between the gcc and
the binutils repositories.  Currently, it's impossible to build a combined
tree, as GCC won't build with the binutils version of dwarf2.def and binutils
won't build with the gcc version.  This patch realigns this file by copying
the defintion of DW_CFA_AARCH64_negate_ra_state_with_pc from binutils,
restoring the ability to build a combined source tree.

2025-02-11  Roger Sayle  <roger@nextmovesoftware.com>

include/ChangeLog
* dwarf2.def (DW_CFA_AARCH64_negate_ra_state_with_pc): Define.

tree-optimization/118817 - missed folding of PRE inserted code

When PRE inserts code it is not fully folded with following SSA
edges which can cause missed optimizations since the next fully
folding pass is way ahead, after strlen which in the PRs case leads
to diagnostics emitted on dead code.

The following mitigates the missed expression canonicalization that
happens during PHI translation where to be inserted expressions are
calculated. It is largely refactoring and eliminating the single
use of fully_constant_expression and otherwise leverages the
work already done by vn_nary_simplify by updating the NARY with
the simplified expression.

PR tree-optimization/118817
* tree-ssa-pre.cc (fully_constant_expression): Fold into
the single caller.
(phi_translate_1): Refactor folded in fully_constant_expression.
* tree-ssa-sccvn.cc (vn_nary_simplify): Update the NARY with
the simplified expression.

* g++.dg/lto/pr118817_0.C: New testcase.

testsuite: Fix g++.dg/modules/adl-5

This testcase wasn't running, because adl-5_a had the wrong extension.
adl-5_d should have been reporting an error because 'frob' is only
visible from within the 'hidden' module but this was missed.

gcc/testsuite/ChangeLog:

* g++.dg/modules/adl-5_a.c: Move to...
* g++.dg/modules/adl-5_a.C: ...here.
* g++.dg/modules/adl-5_d.C: Add errors.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Fix use-after-free of replaced friend instantiation [PR118807]

When instantiating a friend function, we call register_specialization
which adds it to the DECL_TEMPLATE_INSTANTIATIONS of the template.
However, in some circumstances we might immediately call pushdecl and
find an existing specialisation. In this case, when reregistering the
specialisation we also need to update the DECL_TEMPLATE_INSTANTIATIONS
list so that we don't try to access the freed spec again later.

PR c++/118807

gcc/cp/ChangeLog:

* pt.cc (reregister_specialization): Remove spec from
DECL_TEMPLATE_INSTANTIATIONS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118807.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

x86: Correct ASM_OUTPUT_SYMBOL_REF

x is not a macro argument. It just happens to work as final.cc passes
x for 2nd argument:

final.cc: ASM_OUTPUT_SYMBOL_REF (file, x);

PR target/118825
* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
SYM.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

config.gcc: Support mips*64*-linux-muslabi64 as ABI64 by default

LLVM introduced this triple support. Let's sync with it.

gcc
* config.gcc: Add mips*64*-linux-muslabi64 triple support.

MIPS: Add some floating point instructions support for MIPSr6

This patch adds some of the float point instructions from
MIPS32 Release 6(mips32r6) with their respective built-in
functions and tests:

    min_a_s, min_a_d
    max_a_s, max_a_d
    rint_s, rint_d
    class_s, class_d

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_minmax): Include
fclass type.
(i6400_fpu_fadd): Include frint type.
* config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
for __builtin_mipsr6_xxx.
(MIPSR6_BUILTIN_PURE): Same as above.
(CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
(CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
(CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
New code_aliasing macros.
(mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
max_a_d, class_s, class_d builtins.
* config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
(ISA_HAS_FCLASS): Same as above.
* config/mips/mips.md (UNSPEC_FRINT): New unspec.
(UNSPEC_FCLASS): Same as above.
(type): Add frint and fclass.
(fmin_a_<mode>): Generates MINA.fmt instructions.
(fmax_a_<mode>): Generates MAXA.fmt instructions.
(rint<mode>2): Generates RINT.fmt instructions.
(fclass_<mode>): Generates CLASS.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fadd): Include
frint type.
(p6600_fpu_fabs): Include fclass type.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-class.c: New tests for MIPSr6
* gcc.target/mips/mips-minamaxa.c: Same as above.
* gcc.target/mips/mips-rint.c: Same as above.

Signed-off-by: Jie Mei <jie.mei@oss.cipunited.com>
Co-authored-by: Xi Ruoyao <xry111@xry111.site>

libphobos: Disable libphobos.phobos/std/concurrency.d on macOS 13+ [PR111628]

The libphobos.phobos_shared/std/concurrency.d test just hangs on macOS
13 and beyond and isn't even termintated after the testsuite timeout is
exceeded.  Thus, more and more concurrency.exe processes keep
accumulating, consuming CPU time for nothing.

To avoid this, this patch skips the test on macOS 13+.  The static test
SEGVs immediately instead, but I'm skipping it too for symmetry.

Tested on macOS 15 (where it becomes UNSUPPORTED) and 12 (where it still
PASSes).

I have no idea what happens on Darwin/arm64, so currently the skipping
is restricted to Darwin/x86_64.

2025-02-10  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

PR d/111628
* testsuite/libphobos.phobos/phobos.exp (libphobos_skip_tests):
Add libphobos.phobos/std/concurrency.d on macOS 13+.
* testsuite/libphobos.phobos_shared/phobos_shared.exp
(libphobos_skip_tests): Likewise for
libphobos.phobos_shared/std/concurrency.d

testsuite: LoongArch: Remove from btrunc, ceil, and floor effective target allowlist

Now that C default is C23, so we can no longer use LSX/LASX instructions
for these operations as the standard disallows raising INEXACT
exceptions. So LoongArch is no longer suitable for these effective
targets.

Fix the test failures on gcc.dg/vect/vect-rounding-*.c. For the old
standards or -ffp-int-builtin-inexact we already provide test coverage
with gcc.target/loongarch/vect-ftint.c.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_btrunc): Drop LoongArch.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floor): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_lfloor): Likewise.
(check_effective_target_vect_call_lfloorf): Likewise.

i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]

When moving intrins around for AVX10 implementation in GCC 14,
the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly
wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__",
leading to the intrin file not `-Wsystem-headers -Wundef` clean
since r14-4490.

gcc/ChangeLog:

PR target/118813
* config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
wrap.

PR modula2/118761: gm2 driver doesnt behave as gcc for -fhelp=BLA

This patch enables the gm2 driver to handle -fsyntax-only -fhelp=optimizers,
for example, correctly without terminating with gm2: fatal error:
no input files.

gcc/m2/ChangeLog:

PR modula2/118761
* gm2spec.cc (lang_specific_driver): Add case clauses for
OPT__help, OPT__help_ set in_added_libraries to 0 and early
return.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

libbacktrace: add cast to avoid undefined shift

Patch from pgerell@github.

* elf.c (elf_uncompress_lzma_block): Add casts to avoid
potentially shifting a value farther than its type size.

This improves an error message, avoiding at ... at.

gcc/fortran/ChangeLog:

PR fortran/24878
* interface.cc (compare_parameter): Better wording on
error message.

gcc/testsuite/ChangeLog:

PR fortran/24878
* gfortran.dg/interface_51.f90: Adjust expected error message.

Fortran: checking of pointer targets for structure constructors [PR56423]

Check the target of a pointer component in a structure constructor for same
ranks, and that the initial-data-target does not have vector subscripts.

PR fortran/56423

gcc/fortran/ChangeLog:

* resolve.cc (resolve_structure_cons): Check rank of pointer target;
reject pointer target with vector subscripts.

gcc/testsuite/ChangeLog:

* gfortran.dg/derived_constructor_comps_2.f90: Adjust test.
* gfortran.dg/derived_constructor_comps_8.f90: New test.

[gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has

Assume that a distro has configured, e.g., a gfx9-generic multilib but not
for gfx902. In that case, mkoffload would fail to link with "error:
incompatible mach". With this commit, an error is printed suggesting to try
the associated generic architecture instead. The behavior is unchanged if
there is a multilib available for the specific ISA or when there is also no
multilib for the generic ICA.

Note: The build of generic multilibs are currently not enabled by default;
they also require the linker/assembler of LLVM 19 or newer and, in particular,
for the execution a future ROCm release. (The next one? In any case, 6.3.2
does not support generic ISAs, yet.)

gcc/ChangeLog:

* config/gcn/mkoffload.cc (enum elf_arch_code): Add
EF_AMDGPU_MACH_AMDGCN_NONE.
(elf_arch): Use enum elf_arch_code as type.
(tool_cleanup): Silence warning by removing tailing '.' from error.
(get_arch_name): Return enum elf_arch_code.
(check_for_missing_lib): New; print fatal error if the multilib
is not available but it is for the associate generic ISA.
(main): Call it.

[gcn] install.texi: Update for new ISA targets and their requirements

GCN now supports several additional ISA targets such that no longer
all targets have a multilib by default; add a note about this, the
generic targets and the required LLVM (and ROCm) versions.

gcc/ChangeLog:

* doc/install.texi (GCN): Update section about multilibs and
required LLVM version.

ipa-cp: Perform operations in the appropriate types (PR 118097)

One of the testcases from PR 118097 and the one from PR 118535 show
that the fix to PR 118138 was incomplete.  We must not only make sure
that (intermediate) results of operations performed by IPA-CP are
fold_converted to the type of the destination formal parameter but we
also must decouple the these types from the ones in which operations
are performed.

This patch does that, even though we do not store or stream the
operation types, instead we simply limit ourselves to tcc_comparisons
and operations for which the first operand and the result are of the
same type as determined by expr_type_first_operand_type_p.  If we
wanted to go beyond these, we would indeed need to store/stream the
respective operation type.

ipa_value_from_jfunc needs an additional check that res_type is not
NULL because it is not called just from within IPA-CP (where we know
we have a destination lattice slot belonging to a defined parameter)
but also from inlining, ipa-fnsummary and ipa-modref where it is used
to examine a call to a function with variadic arguments and we do not
have types for the unknown parameters.  But we cannot really work with
those or estimate any benefits when it comes to them, so ignoring them
should be OK.

Even after this patch, ipa_get_jf_arith_result has a parameter called
res_type in which it performs operations for aggregate jump functions,
where we do not allow type conversions when constucting the jump
functions and the type is the type of the stored data.  In GCC 16, we
could relax this and allow conversions like for scalars.

gcc/ChangeLog:

2025-01-20  Martin Jambor  <mjambor@suse.cz>

PR ipa/118097
* ipa-cp.cc (ipa_get_jf_arith_result): Adjust comment.
(ipa_get_jf_pass_through_result): Removed.
(ipa_value_from_jfunc): Use directly ipa_get_jf_arith_result, do
not specify operation type but make sure we check and possibly
convert the result.
(get_val_across_arith_op): Remove the last parameter, always pass
NULL_TREE to ipa_get_jf_arith_result in its last argument.
(propagate_vals_across_arith_jfunc): Do not pass res_type to
get_val_across_arith_op.
(propagate_vals_across_pass_through): Add checking assert that
parm_type is not NULL.

gcc/testsuite/ChangeLog:

2025-01-24  Martin Jambor  <mjambor@suse.cz>

PR ipa/118097
* gcc.dg/ipa/pr118097.c: New test.
* gcc.dg/ipa/pr118535.c: Likewise.
* gcc.dg/ipa/ipa-notypes-1.c: Likewise.

arm: fix typo in dg-require-effective-target [PR118089]

Trivial typo.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c (dg-require-effective-target): Fix
typo in directive.

i386: Change RTL representation of bt[lq] [PR118623]

The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
  [(set (pc)
        (if_then_else (match_operator 0 "bt_comparison_operator"
                        [(zero_extract:SWI48
                           (match_operand:SWI48 1 "nonimmediate_operand")
                           (const_int 1)
                           (match_operand:QI 2 "nonmemory_operand"))
                         (const_int 0)])
                      (label_ref (match_operand 3))
                      (pc)))
   (clobber (reg:CC FLAGS_REG))]
  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
   && (CONST_INT_P (operands[2])
       ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
          && INTVAL (operands[2])
               >= (optimize_function_for_size_p (cfun) ? 8 : 32))
       : !memory_operand (operands[1], <MODE>mode))
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extract:SWI48
            (match_dup 1)
            (const_int 1)
            (match_dup 2))
          (const_int 0)))
   (set (pc)
        (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
                      (label_ref (match_dup 3))
                      (pc)))]
{
  operands[0] = shallow_copy_rtx (operands[0]);
  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set.  These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it.  If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.

So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything.  The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly.  If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U.  The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.

2025-02-10  Jakub Jelinek  <jakub@redhat.com>

PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt<SWI48:mode>_mask): Likewise.
(*jcc_bt<mode>): Likewise.  Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_mask): Likewise.
(*jcc_bt<SWI48:mode>_mask_1): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
(*bt<mode>_setncqi_2): Likewise.
(*bt<mode>_setc<mode>_mask): Likewise.

* gcc.c-torture/execute/pr118623.c: New test.

testsuite: Fix two testisms on x86 after PFA [PR118754]

These two tests now vectorize the result finding
loop with PFA and so the number of loops checked
fails.

This fixes them by adding #pragma GCC novector to
the testcases.

gcc/testsuite/ChangeLog:

PR testsuite/118754
* gcc.dg/vect/vect-tail-nomask-1.c: Add novector.
* gcc.target/i386/pr106010-8c.c: Likewise.

Daily bump.

[PR target/115123] Fix testsuite fallout from sinking heuristic change

Code sinking is just semantic preserving code motions, so it's a lot like
scheduling in that code motions can change the vector configuration needed at
various program points. That in turn can also change the number of vsetvls as
we may or may not be able to merge them after the code motions.

The sinking heuristics were twiddled several months ago resulting in a handful
of scan-asm failures. This patch adjusts the tests appropriately fixing
pr115123 (P3 regression).

PR target/115123
gcc/testsuite
* gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise.

[PR middle-end/117263] Avoid unused-but-set warning in genautomata

This is a trivial bug where a user wanted to define NDEBUG when building
genautomata, presumably trying to debug its behavior. This resulted in a
unused-but-set warning which caused the build to fail.

Dario included the trivial fixes in the PR which I put through the usual
bootstrap & regression test as well as compiling genautomata with NDEBUG.

Pushing to the trunk.

PR middle-end/117263
gcc/
* genautomata.cc (output_statistics): Avoid set but unnused warnings
when compiling with NDEBUG.

Test procedure dummy arguments against global symbols, if available.

this fixes a rather old PR from 2005, where a subroutine
could be passed and called as a function. This patch checks
for that, also for the reverse, and for wrong types of functions.

I expect that this will find a few bugs in dusty deck code...

gcc/fortran/ChangeLog:

PR fortran/24878
* interface.cc (compare_parameter): Check global subroutines
passed as actual arguments for subroutine / function and
function type.

gcc/testsuite/ChangeLog:

PR fortran/24878
* gfortran.dg/interface_51.f90: New test.

[RISC-V][PR target/118146] Fix ICE for unsupported modes

There's some special case code in the risc-v move expander to try and optimize
cases where the source is a subreg of a vector and the destination is a scalar
mode.

The code works fine except when we have no support for the given mode. ie HF or
BF when those extensions aren't enabled. We'll end up tripping an assert in
that case when we should have just let standard expansion do its thing.

Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester
to render a verdict before moving forward.

PR target/118146
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg
of vector source better to avoid ICE.

gcc/testsuite
* gcc.target/riscv/pr118146-1.c: New test.
* gcc.target/riscv/pr118146-2.c: New test.

Daily bump.

ad target/118764: Fix a typo in doc/extend.texi.

gcc/
PR target/118764
* doc/invoke.texi (AVR Options): Fix typos.

[PATCH] OpenMP: Improve Fortran metadirective diagnostics [PR107067]

The Fortran front end was giving an ICE instead of a user-friendly
diagnostic when variants of a metadirective variant had different
statement associations. The particular test case reported in the issue
also involved invalid placement of the "omp end metadirective" which
was not being diagnosed either.

gcc/fortran/ChangeLog
PR middle-end/107067
* parse.cc (parse_omp_do): Diagnose missing "OMP END METADIRECTIVE"
after loop.
(parse_omp_structured_block): Likewise for strictly structured block.
(parse_omp_metadirective_body): Use better test for variants ending
at different places. Issue a user diagnostic at the end if any
were inconsistent, instead of calling gcc_assert.

gcc/testsuite/ChangeLog
PR middle-end/107067
* gfortran.dg/gomp/metadirective-11.f90: Remove the dg-ice, update
for current behavior, and add more tests to exercise the new error
code.

libgcc: On FreeBSD use GCC's crt objects for static linking

Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's
crt objects for static linking. Otherwise it could mix crtbeginT.o
from the base system with libgcc's crtend.o, possibly leading to
segfaults.

libgcc:
PR target/118685
* config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts.

Signed-off-by: Dimitry Andric <dimitry@andric.com>

GCN, nvptx: 'sorry, unimplemented: exception handling not supported'

For GCN, this avoids ICEs further down the compilation pipeline. For nvptx,
there's effectively no change: in presence of exception handling constructs,
instead of 'sorry, unimplemented: target cannot support nonlocal goto', we
now emit 'sorry, unimplemented: exception handling not supported'.

Additionally, turn test cases into UNSUPPORTED if running into
'sorry, unimplemented: exception handling not supported'.

gcc/
* config/gcn/gcn.md (exception_receiver): 'define_expand'.
* config/nvptx/nvptx.md (exception_receiver): Likewise.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.
* gcc.dg/pr104464.c: Remove GCN XFAIL.
libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.

For a few test cases, clarify dependance on effective-target 'nonlocal_goto' into 'exceptions'

For example, for nvptx, these test cases currently indeed fail with
'sorry, unimplemented: target cannot support nonlocal goto'. However,
that's just an artefact of non-existing support for exception handling,
and these test cases already require effective-target 'exceptions'.

gcc/testsuite/
* gcc.dg/cleanup-12.c: Don't 'dg-skip-if "" { ! nonlocal_goto }'.
* gcc.dg/cleanup-13.c: Likewise.
* gcc.dg/cleanup-5.c: Likewise.
* gcc.dg/gimplefe-44.c: Don't
'dg-require-effective-target nonlocal_goto'.

nvptx doesn't actually support effective-target 'exceptions'

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_exceptions):
'return 0' for '[istarget nvptx-*-*]'.

BPF doesn't actually support effective-target 'exceptions' [PR118772]

PR target/118772
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_exceptions):
'return 0' for '[istarget bpf-*-*]'.

Clarify that effective-targets 'exceptions' and 'exceptions_enabled' are orthogonal

In Subversion r268025 (Git commit 3f21b8e3f7be32dd2b3624a2ece12f84bed545bb)
"Add dg-require-effective-target exceptions", effective-target 'exceptions'
was added, which "says that AMD GCN does not support [exception handling]".

In Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca)
"MSP430: Add -fno-exceptions multilib", effective-target 'exceptions_enabled'
was added "to check if the testing configuration supports exceptions".  Testing
"if exceptions are unsupported or disabled (e.g. by passing -fno-exceptions)"
works as expected if exception handling is disabled at the front-end level
('-fno-exceptions'; the "exceptions are [...] disabled" case):

    exceptions_enabled2066068.cc: In function ‘void foo()’:
    exceptions_enabled2066068.cc:3:27: error: exception handling disabled, use ‘-fexceptions’ to enable

However, effective-target 'exceptions_enabled' additionally assumes that
"If exceptions aren't supported [by the target], then they're not enabled".
This is not correct: it's not unlikely that, in presence of explicit/implicit
'-fexceptions', exception handling code gets fully optimized away by the
compiler, and therefore effective-target 'exceptions_enabled' test cases may
PASS even for targets that don't support effective-target 'exceptions'; these
two effective-targets are orthogonal concepts.

(For completeness: code with trivial instances of C++ exception handling may
translate into simple '__cxa_allocate_exception', '__cxa_throw' function calls
without requiring any back end-level "exceptions magic", and then trigger
unresolved symbols at link time, if these functions are not available.)

This change only affects GCN, as that one currently is the only target declared
as not supporting effective-target 'exceptions'.

gcc/
* doc/sourcebuild.texi (Effective-Target Keywords): Clarify that
effective-target 'exceptions' and 'exceptions_enabled' are
orthogonal.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-prune): Clarify effective-target
'exceptions_enabled'.
* lib/target-supports.exp
(check_effective_target_exceptions_enabled): Don't consider
effective-target 'exceptions'.
libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Clarify
effective-target 'exceptions_enabled'.

'gcc.dg/pr88870.c': don't 'dg-require-effective-target nonlocal_goto'

I confirm that back then, 'gcc.dg/pr88870.c' for nvptx failed due to
'sorry, unimplemented: target cannot support nonlocal goto', however at some
(indeterminate) point in time, that must've disappeared, and we now don't have
to 'dg-require-effective-target nonlocal_goto' anymore, and therefore get:

[-UNSUPPORTED:-]{+PASS:+} gcc.dg/pr88870.c {+(test for excess errors)+}

(And, if ever necessary again, this nowadays probably should
'dg-require-effective-target exceptions' instead of 'nonlocal_goto'.)

gcc/testsuite/
* gcc.dg/pr88870.c: Don't 'dg-require-effective-target nonlocal_goto'.

i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_<code><mode> define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes.  It works fine with
VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
<code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the <mask_codefor><code><mode>3<mask_name>
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative.  So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.

The following patch fixes it by keeping the <code><mode>3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
<code><mode>3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.

2025-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/118776
* config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*avx512bw_<code><mode>3<mask_name>): ... this.  Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.

* gcc.target/i386/pr118776.c: New test.

x86: Verify that PUSH/POP can be skipped

For

int f(int);

int advance(int dz)
{
    if (dz > 0)
        return (dz + dz) * dz;
    else
        return dz * f(dz);
}

Before r15-1619-g3b9b8d6cfdf593

advance(int):
        push    rbx
        mov     ebx, edi
        test    edi, edi
        jle     .L2
        imul    ebx, edi
        lea     eax, [rbx+rbx]
        pop     rbx
        ret
.L2:
        call    f(int)
        imul    eax, ebx
        pop     rbx
        ret

After

advance(int):
        test    edi, edi
        jle     .L2
        imul    edi, edi
        lea     eax, [rdi+rdi]
        ret
.L2:
        sub     rsp, 24
        mov     DWORD PTR [rsp+12], edi
        call    f(int)
        imul    eax, DWORD PTR [rsp+12]
        add     rsp, 24
        ret

There's no call in if branch, it's not optimal to push rbx at the entry
of the function, it can be sinked to else branch. When "jle .L2" is not
taken, it can save one push instruction.  Update pr111673.c to verify
that this optimization isn't turned off.

PR rtl-optimization/111673
* gcc.target/i386/pr111673.c: Verify that PUSH/POP can be
skipped.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

aarch64: gimple fold aes[ed] [PR114522]

Instead of waiting to get combine/rtl optimizations fixed here. This fixes the
builtins at the gimple level. It should provide for slightly faster compile time
since we have a simplification earlier on.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

PR target/114522
* config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function.
(aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese
and crypto_aesd.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fortran: fix initialization of allocatable non-deferred character [PR59252]

PR fortran/59252

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_trans_subcomponent_assign): Initialize
allocatable non-deferred character with NULL properly.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocatable_char_1.f90: New test.

rs6000: Add cast to avoid pointer to integer comparison warning [PR117674]

2025-02-07 Peter Bergner <bergner@linux.ibm.com>

libgcc/
PR target/117674
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Add cast to
avoid comparison between pointer and integer warning.

Add a cache of recent lines

For larger files the file_cache line index will be spread out to make
the index fit into the fixed buffer, so any access to the non latest line
will need some skipping of lines.

Most accesses for line are near the latest line because
a diagnostic is likely near where the scanner is currently lexing.

Add a second cache for recent lines. It is organized as a ring buffer
and maintains the last 256 lines relative to the last input line.

With that, enabling -Wmisleading-indentation for the test case in
PR preprocessor/118168, is within the run-to-run variation.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache::m_line_recent,
m_line_recent_first, m_line_recent_last): Add.
(file_cache_slot::evict): Clear new fields.
(file_cache_slot::create): Clear new fields.
(file_cache_slot::file_cache_slot): Initialize new fields.
(file_cache_slot::~file_cache_slot): Release m_line_recent.
(file_cache_slot::get_next_line): Maintain ring buffer of lines
in m_line_recent.
(file_cache_slot::read_line_num): Use m_line_recent to look up
recent lines quickly.

arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]

For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry.  This saves
a couple of bytes in the resulting image.  This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.

Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.

Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.

Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.

Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note.  The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.

gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.

arm: fix ICE due to fix for POP {PC} change

My earlier change for making the compiler prefer

POP {PC}

over

LDR PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC. This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.

[rtl-optimization/116244] Don't create bogus regs in alter_subreg

> Jeff Law <jeffreyalaw@gmail.com> writes:
>> So pulling on this thread leads me into the code that sets up
>> ALLOCNO_WMODE in create_insn_allocnos:
>>
>>>            if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
>>>              {
>>>                a = ira_create_allocno (regno, false, ira_curr_loop_tree_node);
>>>                if (outer != NULL && GET_CODE (outer) == SUBREG)
>>>                  {
>>>                    machine_mode wmode = GET_MODE (outer);
>>>                    if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
>>>                      ALLOCNO_WMODE (a) = wmode;
>>>                  }
>>>              }
>> Note how we only set ALLOCNO_MODE only at allocno creation, so it'll
>> work as intended if and only if the first reference is via a SUBREG.
>
> Huh, yeah, I agree that that looks wrong.
>
>> ISTM the fix here is to always do the check and set ALLOCNO_WMODE.
>>[ Snipped discussion on a non-issue. ]

>
> So ISTM that moving the code out of the "if (... == NULL)" should be
> enough on its own.
>
>> And it all makes sense that you caught this.  You and another colleague
>> at ARM were trying to address this exact problem ~11 years ago ;-)
>
> Heh, thought it sounded familiar :)

So attached is the updated patch that adjusts IRA to avoid this problem.

Georg-Johann, this may explain an issue you were running into as well where you
got an invalid allocation.  I think yours was at the higher end of the register
file, but the core issue is potentially the same (looking at the first use
rather than all of them for paradoxical subregs).

I've had this in my tester about a week.  So it's been through the crosses as
well as various native bootstraps, including but not limited to m68k, ppc,
s390, hppa, sh4, etc.  And just for good measure I bootstrapped & regression
tested it on x86_64 a few minutes ago.

Pushing to the trunk.

PR rtl-optimization/116244
gcc/
* ira-build.cc (create_insn_allocnos): Do not restrict the check for
subreg uses to allocno creation time.  Do it for all uses.

gcc/testsuite/
* g++.target/m68k/m68k.exp: New test driver.
* g++.target/m68k/pr116244.C: New test.

c++: Fix up name independent decl in structured binding handling in range for [PR115586]

cp_parser_range_for temporarily reverts IDENTIFIER_BINDING changes
to hide the decls from the structured bindings from lookup during
parsing of the expression after :
If there are 2 or more name independent decls, we undo IDENTIFIER_BINDING
for the same name multiple times, even when just one has been added
(with a TREE_LIST inside of it as decl).

The following patch fixes it by handling the _ name at most once, the
later loop will DTRT then and just reinstall the temporarily hidden
binding with the TREE_LIST in there.

2025-02-07 Jakub Jelinek <jakub@redhat.com>

PR c++/115586
* parser.cc (cp_parser_range_for): For name independent decls in
structured bindings, only push the name/binding once per
structured binding.

* g++.dg/cpp26/name-independent-decl9.C: New test.
* g++.dg/cpp26/name-independent-decl10.C: New test.

c++: Fix up handling of for/while loops with declarations in condition [PR86769]

As the following testcase show (note, just for-{3,4,6,7,8}.C, constexpr-86769.C
and stmtexpr27.C FAIL without the patch, the rest is just that I couldn't
find coverage for some details and so added tests we don't regress or for5.C
is from Marek's attempt in the PR), we weren't correctly handling for/while
loops with declarations as conditions.

The C++ FE has the simplify_loop_decl_cond function which transforms
such loops as mentioned in the comment:
            while (A x = 42) { }
            for (; A x = 42;) { }
   becomes
            while (true) { A x = 42; if (!x) break; }
            for (;;) { A x = 42; if (!x) break; }
For for loops this is not enough, as the x declaration should be
still in scope when expr (if any) is executed, and injecting the
expr expression into the body in the FE needs to have the continue
label in between, something normally added by the c-family
genericization.  One of my thoughts was to just add there an artificial
label plus the expr expression in the FE and tell c-family about that
label, so that it doesn't create it but uses what has been emitted.

Unfortunately break/continue are resolved to labels only at c-family
genericization time and by moving the condition (and its preparation
statements such as the DECL_EXPR) into the body (and perhaps by also
moving there the (increment) expr as well) we resolve incorrectly any
break/continue statement appearing in cond (or newly perhaps also expr)
expression(s).  While in standard C++ one can't have something like that
there, with statement expressions they are possible there, and we actually
have testsuite coverage that when they appear outside of the body of the
loop they bind to an outer loop rather than the inner one.  When the FE
moves everything into the body, c-family can't distinguish any more between
the user body vs. the condition/preparation statements vs. expr expression.

So, the following patch instead keeps them separate and does the merging
only at the c-family loop genericization time.  For that the patch
introduces two new operands of FOR_STMT and WHILE_STMT, *_COND_PREP
which is forced to be a BIND_EXPR which contains the preparation statements
like DECL_EXPR, and the initialization of that variable, so basically what
{FOR,WHILE}_BODY has when we encounter the function dealing with this,
except one optional CLEANUP_STMT at the end which holds cleanups for the
variable if it needs to be destructed.  This CLEANUP_STMT is removed and
the actual cleanup saved into another new operand, *_COND_CLEANUP.

The c-family loop genericization handles such loops roughly the way
https://eel.is/c++draft/stmt.for and https://eel.is/c++draft/stmt.while
specifies, so the body is (if *_COND_CLEANUP is non-NULL)
{ A x = 42; try { if (!x) break; body; cont_label: expr; } finally { cleanup; } }
and otherwise
{ A x = 42; if (!x) break; body; cont_label: expr; }
i.e. the *_COND, *_BODY, optional continue label, FOR_EXPR  are appended
into the body of the *_COND_PREP BIND_EXPR.

And when doing constexpr evaluation of such FOR/WHILE loops, we treat
it similarly, first evaluate *_COND_PREP except the
      for (tree decl = BIND_EXPR_VARS (t); decl; decl = DECL_CHAIN (decl))
        destroy_value_checked (ctx, decl, non_constant_p);
part of BIND_EXPR handling for it, then evaluate *_COND (and decide based
on whether it was false or true like before), then *_BODY, then FOR_EXPR,
then *_COND_CLEANUP (similarly to the way how CLEANUP_STMT handling handles
that) and finally do those destroy_value_checked.

Note, the constexpr-86769.C testcase FAILs with both clang++ and MSVC (note,
the rest of tests PASS with clang++) but I believe it must be just a bug
in those compilers, new int is done in all the constructors and delete is
done in the destructor, so when clang++ reports one of the new int weren't
deallocated during constexpr evaluation I don't see how that would be
possible.  When the same test has all the constexpr stuff, all the new int
are properly deleted at runtime when compiled by both compilers and valgrind
is happy about it, no leaks.

2025-02-07  Jakub Jelinek  <jakub@redhat.com>
    Jason Merrill  <jason@redhat.com>

PR c++/86769
gcc/c-family/
* c-common.def (FOR_STMT): Add 2 operands and document them.
(WHILE_STMT): Likewise.
* c-common.h (WHILE_COND_PREP, WHILE_COND_CLEANUP): Define.
(FOR_COND_PREP, FOR_COND_CLEANUP): Define.
* c-gimplify.cc (genericize_c_loop): Add COND_PREP and COND_CLEANUP
arguments, handle them if they are non-NULL.
(genericize_for_stmt, genericize_while_stmt, genericize_do_stmt):
Adjust callers.
gcc/c/
* c-parser.cc (c_parser_while_statement): Add 2 further NULL_TREE
operands to build_stmt.
(c_parser_for_statement): Likewise.
gcc/cp/
* semantics.cc (set_one_cleanup_loc): New function.
(set_cleanup_locs): Use it.
(simplify_loop_decl_cond): Remove.
(adjust_loop_decl_cond): New function.
(begin_while_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_while_stmt_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_while_stmt): Call do_poplevel also on WHILE_COND_PREP if
non-NULL and also use pop_stmt_list rather than do_poplevel for
WHILE_BODY in that case.  Call set_one_cleanup_loc.
(begin_for_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_for_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_for_stmt): Call do_poplevel also on FOR_COND_PREP if non-NULL
and also use pop_stmt_list rather than do_poplevel for FOR_BODY in
that case.  Call set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Handle
{WHILE,FOR}_COND_{PREP,CLEANUP}.
(check_for_return_continue): Handle {WHILE,FOR}_COND_PREP.
(potential_constant_expression_1): RECUR on
{WHILE,FOR}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/diagnostic/redeclaration-7.C: New test.
* g++.dg/expr/for3.C: New test.
* g++.dg/expr/for4.C: New test.
* g++.dg/expr/for5.C: New test.
* g++.dg/expr/for6.C: New test.
* g++.dg/expr/for7.C: New test.
* g++.dg/expr/for8.C: New test.
* g++.dg/ext/stmtexpr27.C: New test.
* g++.dg/cpp2a/constexpr-86769.C: New test.
* g++.dg/cpp26/name-independent-decl7.C: New test.
* g++.dg/cpp26/name-independent-decl8.C: New test.

jit/118780 - make sure to include dlfcn.h when plugin support is disabled

The following makes the dlfcn.h explicitly requested which avoids
build failure when JIT is enabled but plugin support disabled as
currently the include is conditional on plugin support.

PR jit/118780
gcc/
* system.h: Check INCLUDE_DLFCN_H for including dlfcn.h instead
of ENABLE_PLUGIN.
* plugin.cc: Define INCLUDE_DLFCN_H.

gcc/jit/
* jit-playback.cc: Define INCLUDE_DLFCN_H.
* jit-result.cc: Likewise.

libstdc++: fix a dangling reference crash in ranges::is_permutation [PR118160]

The code was caching the result of `invoke(proj, *it)` in a local
`auto &&` variable. The problem is that this may create dangling
references, for instance in case `proj` is `std::identity` (the common
case) and `*it` produces a prvalue: lifetime extension does not
apply here due to the expressions involved.

Instead, store (and lifetime-extend) the result of `*it` in a separate
variable, then project that variable. While at it, also forward the
result of the projection to the predicate, so that the predicate can
act on the proper value category.

libstdc++-v3/ChangeLog:

PR libstdc++/118160
PR libstdc++/100249
* include/bits/ranges_algo.h (__is_permutation_fn): Avoid a
dangling reference by storing the result of the iterator
dereference and the result of the projection in two distinct
variables, in order to lifetime-extend each one.
Forward the projected value to the predicate.
* testsuite/25_algorithms/is_permutation/constrained.cc: Add a
test with a range returning prvalues. Test it in a constexpr
context, in order to rely on the compiler to catch UB.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>

libstdc++: Handle exceptions in std::ostream::sentry destructor

Because basic_ostream::sentry::~sentry is implicitly noexcept, we can't
let any exceptions escape from it, or the program would terminate. If
the streambuf's sync() function throws, or if it returns an error and
setting badbit in the stream state throws, then the program would
terminate.

LWG 835 intended to prevent exceptions from being thrown by the
std::basic_ostream::sentry destructor, but failed to cover the case
where the streambuf's sync() member throws an exception. LWG 4188 is
needed to fix that part. In any case, LWG 835 was never implemented for
libstdc++ so this does that, as well as my proposed fix for 4188 (that
badbit should be set if pubsync() exits via an exception).

In order to avoid a second try-catch block to handle an exception that
might be thrown by setting badbit, this introduces an RAII helper class
that temporarily clears the stream's exceptions mask, then restores it
afterwards.

The try-catch block doesn't handle the forced_unwind exception
explicitly, because catching and rethrowing that would just terminate
when it reached the sentry's implicit noexcept(true) anyway.

libstdc++-v3/ChangeLog:

* include/bits/ostream.h (basic_ostream::_Disable_exceptions):
RAII helper type.
(basic_ostream::sentry::~sentry): Use _Disable_exceptions. Add
try-catch block around call to pubsync.
* testsuite/27_io/basic_ostream/exceptions/char/lwg4188.cc: New
test.
* testsuite/27_io/basic_ostream/exceptions/wchar_t/lwg4188.cc:
New test.

libstdc++: Add comment about use of always_inline attributes [PR111050]

Add a comment referencing PR 111050, to ensure the fix made by
r12-9903-g1be57348229666 doesn't get reverted.

libstdc++-v3/ChangeLog:

PR libstdc++/111050
* include/bits/hashtable_policy.h (_Hash_node_value_base): Add
comment about always_inline attributes.

RISC-V: Make VXRM as global register [PR118103]

Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.

For example as below code:

  21   │
  22   │ void compute ()
  23   │ {
  24   │   size_t vl = __riscv_vsetvl_e16m1 (N);
  25   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
  26   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
  27   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
  28   │
  29   │   __riscv_vse16_v_u16m1 (c, vc, vl);
  30   │ }
  31   │
  32   │ int main ()
  33   │ {
  34   │   initialize ();
  35   │   compute();
  36   │
  37   │   return 0;
  38   │ }

After compile with -march=rv64gcv -O3, we will have:

  30   │ compute:
  31   │     csrwi   vxrm,2
  32   │     lui a3,%hi(a)
  33   │     lui a4,%hi(b)
  34   │     addi    a4,a4,%lo(b)
  35   │     vsetivli    zero,4,e16,m1,ta,ma
  36   │     addi    a3,a3,%lo(a)
  37   │     vle16.v v2,0(a4)
  38   │     vle16.v v1,0(a3)
  39   │     lui a4,%hi(c)
  40   │     addi    a4,a4,%lo(c)
  41   │     vaaddu.vv   v1,v1,v2
  42   │     vse16.v v1,0(a4)
  43   │     ret
  44   │     .size   compute, .-compute
  45   │     .section    .text.startup,"ax",@progbits
  46   │     .align  1
  47   │     .globl  main
  48   │     .type   main, @function
  49   │ main:
       |     // csrwi   vxrm,2 deleted after inline
  50   │     addi    sp,sp,-16
  51   │     sd  ra,8(sp)
  52   │     call    initialize
  53   │     lui a3,%hi(a)
  54   │     lui a4,%hi(b)
  55   │     vsetivli    zero,4,e16,m1,ta,ma
  56   │     addi    a4,a4,%lo(b)
  57   │     addi    a3,a3,%lo(a)
  58   │     vle16.v v2,0(a4)
  59   │     vle16.v v1,0(a3)
  60   │     lui a4,%hi(c)
  61   │     addi    a4,a4,%lo(c)
  62   │     li  a0,0
  63   │     vaaddu.vv   v1,v1,v2

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/118103

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

[testsuite] tolerate later success [PR108357]

On leon3-elf and presumably on other targets, the test fails due to
differences in calling conventions and other reasons, that add extra
gimple stmts that prevent the expected optimization at the expected
point. The optimization takes place anyway, just a little later, so
tolerate that.

for gcc/testsuite/ChangeLog

PR tree-optimization/108357
* gcc.dg/tree-ssa/pr108357.c: Tolerate later optimization.

aarch64: Fix bootstrap with --enable-checking=release [PR118771]

With release checking we get an uninitialization warning
inside aarch64_split_move because of jump threading for the case of `npieces==0`
but `npieces` is never 0 (but there is no way the compiler can know that.
So this fixes the issue by adding a `gcc_assert` to the function which asserts
that `npieces > 0` and fixes the uninitialization warning.

Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release).

The warning:

aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)':
aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized]
3418 |   if (reg_overlap_mentioned_p (dst_pieces[0], src))
      |       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
aarch64.cc:3408:20: note: 'dst_pieces' declared here
3408 |   auto_vec<rtx, 4> dst_pieces, src_pieces;
      |                    ^~~~~~~~~~

PR target/118771
gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is
greater than 0.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Honor dump options for C/C++ '-fdump-tree-original'

In addition to upcoming use of '-fdump-tree-original-lineno', this patch
actually resolves XFAILs for 'c-c++-common/goacc/pr92793-1.c', which had
gotten added as part of commit fa410314ec94c9df2ad270c1917adc51f9147c2c
"[OpenACC] Elaborate testcases that verify column location information [PR92793]".

gcc/c-family/
* c-gimplify.cc (c_genericize): Pass 'local_dump_flags' to
'print_c_tree'.
* c-pretty-print.cc (c_pretty_printer::statement): Pass
'dump_flags' to 'dump_generic_node'.
(c_pretty_printer::c_pretty_printer): Initialize 'dump_flags'.
(print_c_tree): Add 'dump_flags_t' formal parameter.
(debug_c_tree): Adjust.
* c-pretty-print.h (c_pretty_printer): Add 'dump_flags_t
dump_flags'.
(c_pretty_printer::c_pretty_printer): Add 'dump_flags_t' formal
parameter.
(print_c_tree): Adjust.
gcc/testsuite/
* c-c++-common/goacc/pr92793-1.c: Remove
'-fdump-tree-original-lineno' XFAILs.

c++: ICE with unparsed noexcept [PR117106]

In a member-specification of a class, a noexcept-specifier is
a complete-class context.  Thus we delay parsing until the end of
the class via our DEFERRED_PARSE mechanism; see cp_parser_save_noexcept
and cp_parser_late_noexcept_specifier.

We also attempt to defer instantiation of noexcept-specifiers in order
to reduce the number of instantiations; this is done via DEFERRED_NOEXCEPT.

We can even have both, as in noexcept65.C: a DEFERRED_PARSE wrapped in
DEFERRED_NOEXCEPT, which uses the DEFPARSE_INSTANTIATIONS mechanism.
noexcept65.C works, because when we really need the noexcept, which is
when parsing the body of S::A::A(), the noexcept will have been parsed
already; noexcepts are parsed before bodies of member function.

But in this test we have:

  struct A {
      int x;
      template<class>
      void foo() noexcept(noexcept(x)) {}
      auto bar() -> decltype(foo<int>()) {} // #1
  };

and I think the decltype in #1 needs the unparsed noexcept before it
could have been parsed.  clang++ rejects the test and I suppose we
should reject it as well, rather than crashing on a DEFERRED_PARSE
in tsubst_expr.

PR c++/117106
PR c++/118190

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Give an error if the noexcept
hasn't been parsed yet.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept89.C: New test.
* g++.dg/cpp0x/noexcept90.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Properly support null pointer constants in conditional operators [PR118282]

We've been rejecting the following valid code since GCC 4

=== cut here ===
struct A {
  explicit A (int);
  operator void* () const;
};
void foo (const A& x) {
  auto res = 0 ? x : 0;
}
int main () {
  A a{5};
  foo(a);
}
=== cut here ===

The problem is that for COND_EXPR, add_builtin_candidate has an early
return if the true and false values are not pointers that does not take
null pointer constants into account. This causes to not find any valid
conversion, and fail to compile.

This patch fixes the condition to also pass if the true/false values are
not pointers but null pointer constants, which resolves the PR.

PR c++/118282

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate): Also check for null_ptr_cst_p
operands.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/op8.C: New test.

c++: Don't use CLEANUP_EH_ONLY for new expression cleanup [PR118763]

The following testcase is miscompiled since r12-6325 stopped
preevaluating the initializers for new expression.
If evaluating the initializers throws, there is a correct cleanup
for that, but it is marked CLEANUP_EH_ONLY.  While in standard
C++ that is just fine, if it has statement expressions, it can
return or goto out of the expression and we should delete the
pointer in that case too.

There is already a sentry variable initialized to true and
set to false after everything is initialized and used as a guard
for the cleanup, so just removing the CLEANUP_EH_ONLY flag does
everything we need.  And in the normal case of the initializer
not using statement expressions at least with -O2 we get the same code,
while the change changes one
try { sentry = true; ... sentry = false; } catch { if (sentry) delete ...; }
into
try { sentry = true; ... sentry = false; } finally { if (sentry) delete ...; }
optimizations will see that sentry is false when reaching the finally
other than through an exception.

Though, wonder what other CLEANUP_EH_ONLY cleanups might be an issue
with statement expressions.

2025-02-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/118763
* init.cc (build_new_1): Don't set CLEANUP_EH_ONLY.

* g++.dg/asan/pr118763.C: New test.

c++: Use cplus_decl_attributes rather than decl_attributes in grokdecl [PR118773]

My r15-3046 change regressed the first half of the following testcase.
When it calls decl_attributes, it doesn't handle attributes with
dependent arguments correctly and so is now rejected that N is not
a constant integer during template parsing.

I've actually followed the pointer/reference case which did that
too and that one has been failing for a couple of years on the
second part of the testcase.

Note, there is also
          if (decl_context != PARM && decl_context != TYPENAME)
            /* Assume that any attributes that get applied late to
               templates will DTRT when applied to the declaration
               as a whole.  */
            late_attrs = splice_template_attributes (&attrs, type);
          returned_attrs = decl_attributes (&type,
                                            attr_chainon (returned_attrs,
                                                          attrs),
                                            attr_flags);
          returned_attrs = attr_chainon (late_attrs, returned_attrs);
call directly to decl_attributes in grokdeclarator, but this one handles
the splicing manually, so maybe it is ok as is (and I don't have a testcase
of anything misbehaving for that).

2025-02-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/118773
* decl.cc (grokdeclarator): Use cplus_decl_attributes rather than
decl_attributes for std_attributes on pointer and array types.

* g++.dg/cpp0x/gen-attrs-87.C: New test.
* g++.dg/gomp/attrs-3.C: Adjust expected diagnostics.

c++: Allow constexpr reads from volatile std::nullptr_t objects [PR118661]

As mentioned in the PR, https://eel.is/c++draft/conv.lval#note-1
says that even volatile reads from std::nullptr_t typed objects actually
don't read anything and https://eel.is/c++draft/expr.const#10.9
says that even those are ok in constant expressions.

So, the following patch adjusts the r9-4793 changes to have an exception
for NULLPTR_TYPE.
As [conv.lval]/3 also talks about accessing to inactive member, I've added
testcase to cover that as well.

2025-02-07 Jakub Jelinek <jakub@redhat.com>

PR c++/118661
* constexpr.cc (potential_constant_expression_1): Don't diagnose
lvalue-to-rvalue conversion of volatile lvalue if it has NULLPTR_TYPE.
* decl2.cc (decl_maybe_constant_var_p): Return true for constexpr
decls with NULLPTR_TYPE even if they are volatile.

* g++.dg/cpp0x/constexpr-volatile4.C: New test.
* g++.dg/cpp0x/constexpr-union9.C: New test.

Fortran: Fix default init of finalizable derived argus [PR116829]

2025-02-07 Tomáš Trnka <trnka@scm.com>

gcc/fortran
PR fortran/116829
* trans-decl.cc (init_intent_out_dt): Always call
gfc_init_default_dt() for BT_DERIVED to apply s->value if the
symbol isn't allocatable. Also simplify the logic a bit.

gcc/testsuite/
PR fortran/116829
* gfortran.dg/derived_init_7.f90: New test.

tree-optimization/115538 - possible wrong-code with SLP conversion

The following fixes a latent issue where we use ranges to verify
correctness of a vector conversion optimization. We rely on ranges
from 'op0' which for SLP is extracted from the representative stmt
which does not necessarily correspond to any actual scalar operation.
We also do not verify the range of all scalar lanes in the SLP
operand match. The following rectifies this, restricting the support
to single-lane SLP nodes at this point - on branches we'd simply
not perform this optimization with SLP.

PR tree-optimization/115538
* tree-vectorizer.h (vect_get_slp_scalar_def): Declare.
* tree-vect-slp.cc (vect_get_slp_scalar_def): New helper.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): For SLP
correctly look at ranges of the scalar defs of the SLP operand.
(supportable_indirect_convert_operation): Likewise.

[gcn] Fix the output amdhsa.version

The amdhsa.version depends on the code object version; while V3 had 1.0,
V4 has 1.1 and V5 (and V6) have 1.2. GCC used 1.0 but generated since
a while either V4 or, with -march=gfx...-generic, V6. Now it uses the
proper version again.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Update
'amdhsa.version' output to match used code version.
* config/gcn/gen-gcn-device-macros.awk: Add a comment to
crosslink.

[GCN] Handle generic ISA names in libgomp's plugin-gcn.c

libgomp/ChangeLog:

* plugin/plugin-gcn.c (ELFABIVERSION_AMDGPU_HSA_V6,
EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET,
GET_GENERIC_VERSION): New #define.
(elf_gcn_isa_is_generic): New.
(isa_matches_agent): Accept all generic code objects on the first
go; extend the diagnostic and handle runtime-failed case.
(create_and_finalize_hsa_program): Call it also after loading
the code failed, pass the status.

LoongArch: Correct the mode for mask{eq,ne}z

For mask{eq,ne}z, rk is always compared with 0 in the full width, thus
the mode for rk should be X.

I found the issue reviewing a patch fixing a similar issue for RISC-V
XTheadCondMov [1], but interestingly I cannot find a test case really
blowing up on LoongArch. But as the issue is obvious enough let's fix
it anyway so it won't blow up in the future.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674004.html

gcc/ChangeLog:

* config/loongarch/loongarch.md
(*sel<code><GPR:mode>_using_<GPR2:mode>): Rename to ...
(*sel<code><GPR:mode>_using_<X:mode>): ... here.
(GPR2): Remove as nothing uses it now.

[ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

If decode_field_reference finds a load that accesses past the inner
object's size, bail out.

Drop the too-strict assert.

for  gcc/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gimple-fold.cc (decode_field_reference): Refuse to consider
merging out-of-bounds BIT_FIELD_REFs.
(make_bit_field_load): Drop too-strict assert.
* tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
(access_in_bounds_of_type_p): ... this.  Change interface,
export.
(tree_could_trap_p): Adjust.
* tree-eh.h (access_in_bounds_of_type_p): Declare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gcc.dg/field-merge-25.c: New.

[gcn] Add gfx9-generic and generic-associated gfx*

This patch adds gfx9-generic, completing the gfx*-generic support.
It also adds all gfx* devices that are part of any of the gfx*-generic,
i.e. gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.

gcc/ChangeLog:

* config/gcn/gcn-devices.def (GCN_DEVICE): Add gfx9-generic,
gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.
Add a currently unused column linking, a specific ISA to a generic
one (if it exists).
* config/gcn/gcn-tables.opt: Regenerate
* doc/invoke.texi (AMD GCN): Add the the new gfc... and the older
gfx{10-3,11}-generic to -march= as 'experimental'.

[gcn] Fix gfx906's sramecc setting

When compiling with -g, mkoffload.cc creates a device object file itself;
however, in order that the linker dos not complain, the ELF flags must
match what the compiler / linker does. For gfx906, the assembler defaults
to sramecc = any, but gcn-devices.def contained unsupported, which is not
the same - causing link errors. That's a regression caused by commit
r15-4540-ga6b26e5ea09779 - which can be best seen by looking at the
changes to mkoffload.cc.

Additionally, this commit adds '...' to the GCN_DEVICE #define in gcn.cc
to make it agnostic to the addition of fields.

gcc/ChangeLog:

* config/gcn/gcn-devices.def (GCN_DEVICE): Change sramecc for
gfx906 to 'any'.
* config/gcn/gcn.cc (GCN_DEVICE): Add tailing ... to #define.

[testsuite] [sparc] select ultrasparc for fsmuld test

vis3move-3.c expects fsmuld, that is not available on all variants of
sparc.  Select a cpu that supports it for the test.

Now, -mfix-ut699 irrevocbly disables fsmuld, so skip the test if the
test configuration uses that option.

for  gcc/testsuite/ChangeLog

* gcc.target/sparc/vis3move-3.c: Select ultrasparc.  Skip with
-mfix-ut699.

[testsuite] [sparc] skip tls tests if emulated

A number of tls tests expect TLS-specific relocations, that are not
present when tls is emulated, as on e.g. leon3-elf. Skip the tests
when tls is emulated.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/tls-ld-int16.c: Skip when tls is emulated.
* gcc.target/sparc/tls-ld-int32.c: Likewise.
* gcc.target/sparc/tls-ld-int8.c: Likewise.
* gcc.target/sparc/tls-ld-uint16.c: Likewise.
* gcc.target/sparc/tls-ld-uint32.c: Likewise.
* gcc.target/sparc/tls-ld-uint8.c: Likewise.

[testsuite] [sparc] skip sparc-ret-1 with -mfix-ut699

Option -mfix-ut699 changes the set of instructions that can be placed
in the delay slot, preventing the expected insn placement. Skip the
test if the option is present.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/sparc-ret-1.c: Skip on -mfix-ut699.

[testsuite] [sparc] use -mtune in alignment tuning test

If -mcpu=leon3 is present in the command line for a test run,
overriding it with -mcpu=niagara7 is not enough to override the tuning
for leon3 selected by the previous -mcpu option.

niagara7-align.c tests for niagara7 alignment tuning, so use -mtune
rather than -mcpu.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/niagara7-align.c: Use -mtune.

ira: Add a target hook for callee-saved register cost scale

commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
Date:   Tue Jun 25 08:37:49 2024 -0500

    ira: Scale save/restore costs of callee save registers with block frequency

scales the cost of saving/restoring a callee-save hard register in epilogue
and prologue with the entry block frequency, which, if not optimizing for
size, is 10000, for all targets.  As the result, callee-saved registers
may not be used to preserve local variable values across calls on some
targets, like x86.  Add a target hook for the callee-saved register cost
scale in epilogue and prologue used by IRA.  The default version of this
target hook returns 1 if optimizing for size, otherwise returns the entry
block frequency.  Add an x86 version of this target hook to restore the
old behavior prior to the above commit.

PR rtl-optimization/111673
PR rtl-optimization/115932
PR rtl-optimization/116028
PR rtl-optimization/117081
PR rtl-optimization/117082
PR rtl-optimization/118497
* ira-color.cc (assign_hard_reg): Call the target hook for the
callee-saved register cost scale in epilogue and prologue.
* target.def (ira_callee_saved_register_cost_scale): New target
hook.
* targhooks.cc (default_ira_callee_saved_register_cost_scale):
New.
* targhooks.h (default_ira_callee_saved_register_cost_scale):
Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
New.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
New.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

[PATCH] RISC-V: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST to correct enum

stack_protect_{set,test}_<mode> were showing up in RTL dumps as
UNSPEC_COPYSIGN and UNSPEC_FMV_X_W due to UNSPEC_SSP_SET and
UNSPEC_SSP_TEST being put in the unspecv enum instead of unspec.

gcc/ChangeLog:

* config/riscv/riscv.md: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST
to unspec enum.

[RISC-V] Fix risc-v expected test output after recent iv changes

Richard S's recent change to iv increment insertion removed a reg->reg move
(which was its intent AFAICT).  This triggered a failure on a riscv test.

That test was meant to verify that we didn't have an extraneous reg->reg move
due to a buglet in the risc-v splitters.  Before the 2023 change we had two
vector reg->reg moves and after the 2023 fix we had just one.  With Richard's
change we have none ;-)  Adjusting test accordingly.

Pushed to the trunk.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/madd-split2-1.c: Update expected
output.

avr.opt.urls += -mcvt

gcc/
* config/avr/avr.opt.urls: Add mcvt.

middle-end: Remove unused internal function after IVopts cleanup [PR118756]

It seems that after my IVopts patches the function contain_complex_addr_expr
became unused and clang is rightfully complaining about it.

This removes the unused internal function.

gcc/ChangeLog:

PR tree-optimization/118756
* tree-ssa-loop-ivopts.cc (contain_complex_addr_expr): Remove.

Fortran: Fix handling of the X edit descriptor.

This patch is a partial fix of handling of X edit descriptors
when combined with certain T edit descriptors.

PR libfortran/114618

libgfortran/ChangeLog:

* io/transfer.c (formatted_transfer_scalar_write): Change name
of vriable 'pos' to 'tab_pos' to improve clarity. Add new
variable next_pos when calculating the maximum position.
Update the calculation of pending spaces.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114618.f90: New test.

c++: Add no_unique_address attribute further test coverage [PR110345]

Another non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-no_unique_address1.C: New test.

c++: Add noreturn attribute further test coverage [PR110345]

Another non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-noreturn1.C: New test.

c++: Add nodiscard attribute further test coverage [PR110345]

Fairly non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-nodiscard1.C: New test.

AVR: Add support for a Compact Vector Table (-mcvt).

Some AVR devices support a CVT:

- Devices from the 0-series, 1-series, 2-series.
- AVR16, AVR32, AVR64, AVR128 devices.

The support is provided by means of a startup code file
crt<mcu>-cvt.o from AVR-LibC v2.3 that can be linked instead
of the traditional crt<mcu>.o.

This patch adds a new command line option -mcvt that links
that CVT startup code (or issues an error when the device
doesn't support a CVT).

PR target/118764
gcc/
* config/avr/avr.opt (-mcvt): New target option.
* config/avr/avr-arch.h (AVR_CVT): New enum value.
* config/avr/avr-mcus.def: Add AVR_CVT flag for devices that
support it.
* config/avr/avr.cc (avr_handle_isr_attribute) [TARGET_CVT]: Issue
an error when a vector number larger that 3 is used.
* config/avr/gen-avr-mmcu-specs.cc (McuInfo.have_cvt): New property.
(print_mcu) <*avrlibc_startfile>: Use crt<mcu>-cvt.o depending
on -mcvt (or issue an error when the device doesn't support a CVT).
* doc/invoke.texi (AVR Options): Document -mcvt.

Fortran: FIx ICE in associate with elemental function [PR118750]

2025-02-06 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/118750
* resolve.cc (resolve_assoc_var): If the target expression has
a rank, do not use gfc_expression_rank, since it will return 0
if the function is elemental. Resolution will have produced the
correct rank.

gcc/testsuite/
PR fortran/118750
* gfortran.dg/associate_72.f90: New test.

loop-iv, riscv: Fix get_biv_step_1 for RISC-V [PR117506]

The following test ICEs on RISC-V at least latently since
r14-1622-g99bfdb072e67fa3fe294d86b4b2a9f686f8d9705 which added
RISC-V specific case to get_biv_step_1 to recognize also
({zero,sign}_extend:DI (plus:SI op0 op1))

The reason for the ICE is that op1 in this case is CONST_POLY_INT
which unlike the really expected VOIDmode CONST_INTs has its own
mode and still satisfies CONSTANT_P.
GET_MODE (rhs) (SImode) is different from outer_mode (DImode), so
the function later does
        *inner_step = simplify_gen_binary (code, outer_mode,
                                           *inner_step, op1);
but that obviously ICEs because while *inner_step is either VOIDmode
or DImode, op1 has SImode.

The following patch fixes it by extending op1 using code so that
simplify_gen_binary can handle it.  Another option would be
to change the !CONSTANT_P (op1) 3 lines above this to
!CONST_INT_P (op1), I think it isn't very likely that we get something
useful from other constants there.

2025-02-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117506
* loop-iv.cc (get_biv_step_1): For {ZERO,SIGN}_EXTEND
of PLUS apply {ZERO,SIGN}_EXTEND to op1.

* gcc.dg/pr117506.c: New test.
* gcc.target/riscv/pr117506.c: New test.

AVR: genmultilib.awk - Use more robust parsing of spaces.

gcc/
PR target/118768
* config/avr/genmultilib.awk: Parse the AVR_MCU lines in
a more robust way w.r.t. white spaces.

LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].

PR target/118561

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc
(loongarch_expand_builtin_lsx_test_branch):
NULL_RTX will not be returned when an error is detected.
(loongarch_expand_builtin): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118561.c: New test.

vect: Move induction IV increments [PR110449]

In this PR, we used to generate:

     .L6:
  mov     v30.16b, v31.16b
  fadd    v31.4s, v31.4s, v27.4s
  fadd    v29.4s, v30.4s, v28.4s
  stp     q30, q29, [x0]
  add     x0, x0, 32
  cmp     x1, x0
  bne     .L6

for an unrolled induction in:

  for (int i = 0; i < 1024; i++)
    {
      arr[i] = freq;
      freq += step;
    }

with the problem being the unnecessary MOV.

The main induction IV was incremented by VF * step == 2 * nunits * step,
and then nunits * step was added for the second store to arr.

The original patch for the PR (r14-2367-g224fd59b2dc8) avoided the MOV
by incrementing the IV by nunits * step twice.  The problem with that
approach is that it doubles the loop-carried latency.  This change was
deliberately not preserved when moving from loop-vect to SLP and so
the test started failing again after r15-3509-gd34cda720988.

I think the main problem is that we put the IV increment in the wrong
place.  Normal IVs created by create_iv are placed before the exit
condition where possible, but vectorizable_induction instead always
inserted them at the start of the loop body.  The only use of the
incremented IV is by the phi node, so the effect is to make both
the old and new IV values live for the whole loop body, which is
why we need the MOV.

The simplest fix therefore seems to be to reuse the create_iv logic.

gcc/
PR tree-optimization/110449
* tree-ssa-loop-manip.h (insert_iv_increment): Declare.
* tree-ssa-loop-manip.cc (insert_iv_increment): New function,
split out from...
(create_iv): ...here and generalized to gimple_seqs.
* tree-vect-loop.cc (vectorizable_induction): Use
standard_iv_increment_position and insert_iv_increment
to insert the IV increment.

gcc/testsuite/
PR tree-optimization/110449
* gcc.target/aarch64/pr110449.c: Expect an increment by 8.0,
but test that there is no MOV.

rtl-optimization/117922 - disable fold-mem-offsets for highly connected CFG

The PR shows fold-mem-offsets taking ages and a lot of memory computing
DU/UD chains as that requires the RD problem. The issue is not so much
the memory required for the pruned sets but the high CFG connectivity
(and that the CFG is cyclic) which makes solving the dataflow problem
expensive.

The following adds the same limit as the one imposed by GCSE and CPROP.

PR rtl-optimization/117922
* fold-mem-offsets.cc (pass_fold_mem_offsets::execute):
Do nothing for a highly connected CFG.

tree-optimization/118749 - bogus alignment peeling causes misaligned access

The vectorizer thinks it can align a vector access to 16 bytes when
using a vectorization factor of 8 and 1 byte elements. That of
course does not work for the 2nd vector iteration. Apparently we
lack a guard against such nonsense.

PR tree-optimization/118749
* tree-vect-data-refs.cc (vector_alignment_reachable_p): Pass
in the vectorization factor, when that cannot maintain
the DRs target alignment do not claim we can reach that
by peeling.

* gcc.dg/vect/pr118749.c: New testcase.

Daily bump.

[committed] Disable ABS instruction on bfin port

I was looking at a regression on the bfin port with a recent change to the IRA
and stumbled across this just doing a general port healthyness evaluation.

The ABS instruction in the blackfin ISA is defined as saturating on INT_MIN,
which is a bit unexpected. We certainly can't use it when -fwrapv is enabled.
Given the failures on the C23 uabs tests, I'm inclined to just disable the
pattern completely.

Fixes pr23047, uabs-2 and uabs-3.

While it's not a regression, it's the blackfin port, so I think we've got a
higher degree of freedom here.

Pushing to the trunk.

gcc/
* config/bfin/bfin.md (abssi): Disable pattern.

c++: Reject default arguments for template class friend functions [PR118319]

We segfault upon the following invalid code

=== cut here ===
template <int> struct S {
  friend void foo (int a = []{}());
};
void foo (int a) {}
int main () {
  S<0> t;
  foo ();
}
=== cut here ===

The problem is that we end up with a LAMBDA_EXPR callee in
set_flags_from_callee, and dereference its NULL_TREE
TREE_TYPE (TREE_TYPE (..)).

This patch sets the default argument to error_mark_node and gives a hard
error for template class friend functions that do not meet the
requirement in C++17 11.3.6/4 (the change is restricted to templates per
discussion with Jason).

PR c++/118319

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Inspect all friend function parameters.
If it's not valid for them to have a default value and we're
processing a template, set the default value to error_mark_node
and give a hard error.

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg18.C: New test.
* g++.dg/parse/defarg18a.C: New test.

[PR115568][LRA]: Use more strict output reload check in rematerialization

  In this PR case LRA rematerialized a value from inheritance insn
instead of output reload one.  This resulted in considering a
rematerilization candidate value available when it was actually
not.  As a consequence an insn after rematerliazation used the
unexpected value and this use resulted in fp exception.  The patch
fixes this bug.

gcc/ChangeLog:

PR rtl-optimization/115568
* lra-remat.cc (create_cands): Check that output reload insn is
adjacent to given insn.  Update a comment.

gcc/testsuite/ChangeLog:

PR rtl-optimization/115568
* gcc.target/i386/pr115568.c: New.

go: update builtin function attributes

PR go/118746
* go-gcc.cc (class Gcc_backend): Define builtin_cold,
builtin_leaf, builtin_nonnull. Alphabetize constants.
(Gcc_backend::Gcc_backend): Update attributes for builtin
functions to match builtins.def.
(Gcc_backend::define_builtin): Split out attribute setting into
set_attribtues.
(Gcc_backend::set_attribtues): New method split out of
define_builtin. Support new flag values.

aarch64: Fix sve/acle/general/ldff1_8.c failures

gcc.target/aarch64/sve/acle/general/ldff1_8.c and
gcc.target/aarch64/sve/ptest_1.c were failing because the
aarch64 port was giving a zero (unknown) cost to instructions
that compute two results in parallel.  This was latent until
r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs
as unknown.

A long-standing todo here is to make insn_cost derive costs from md
information, rather than having to write a lot of matching code in
aarch64_rtx_costs.  But that's not something we can do for GCC 15.

This patch instead treats the cost of a PARALLEL as being the maximum
cost of its constituent sets.  I don't like this very much, since it
isn't really target-specific behaviour.  If it were stage 1, I'd be
trying to change pattern_cost instead.

gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs
the same cost as the costliest SET.