git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Conditionally use floating-point fetch_add builtins

- Some hardware has support for floating point atomic fetch_add (and
  similar).
- There are existing compilers targetting this hardware that use
  libstdc++ -- e.g. NVC++.
- Since the libstdc++ atomic<float>::fetch_add and similar is written
  directly as a CAS loop these compilers can not emit optimal code when
  seeing such constructs.
- I hope to use __atomic_fetch_add builtins on floating point types
  directly in libstdc++ so these compilers can emit better code.
- Clang already handles some floating point types in the
  __atomic_fetch_add family of builtins.
- In order to only use this when available, I originally thought I could
  check against the resolved versions of the builtin in a manner
  something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`.
  I then realised that clang does not expose resolved versions of these
  atomic builtins to the user.
  From the clang discourse it was suggested we instead use SFINAE (which
  clang already supports).
- I have recently pushed a patch for allowing the use of SFINAE on
  builtins: r15-6042-g9ed094a817ecaf
  Now that patch is committed, this patch does not change what happens
  for GCC, while it uses the builtin for codegen with clang.
- I have previously sent a patchset upstream adding the ability to use
  __atomic_fetch_add and similar on floating point types.
  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html
  Once that patchset is upstream (plus the automatic linking of
  libatomic as Joseph pointed out in the email below
  https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
  then current GCC should start to use the builtin branch added in this
  patch.

So *currently*, this patch allows external compilers (NVC++ in
particular) to generate better code, and similarly lets clang understand
the operation better since it maps to a known builtin.

I hope that by GCC 16 this patch would also allow GCC to understand the
operation better via mapping to a known builtin.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (__atomic_fetch_addable): Define
new concept.
(__atomic_impl::__fetch_add_flt): Use new concept to make use of
__atomic_fetch_add when available.
(__atomic_fetch_subtractable, __fetch_sub_flt): Likewise.
(__atomic_add_fetchable, __add_fetch_flt): Likewise.
(__atomic_sub_fetchable, __sub_fetch_flt): Likewise.

Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>

c++: fix propagating REF_PARENTHESIZED_P [PR116379]

Here we have:

  template<typename T>
  struct X{
      T val;
      decltype(auto) value(){
  return (val);
      }
  };

where the return type of value should be 'int &' since '(val)' is an
expression, not a name, and decltype(auto) performs the type deduction
using the decltype rules.

The problem is that we weren't propagating REF_PARENTHESIZED_P
correctly: the return value of finish_non_static_data_member in this
test was a REFERENCE_REF_P, so we didn't set the flag.  We should
use force_paren_expr like below.

PR c++/116379

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <COMPONENT_REF>: Use force_paren_expr to set
REF_PARENTHESIZED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto9.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

tree: Fix up the DECL_VALUE_EXPR GC marking [PR118790]

The ggc_set_mark call in gt_value_expr_mark_2 is actually wrong, that
just marks the VAR_DECL itself, but doesn't mark the subtrees of it (type
etc.).  So, I think we need to test gcc_marked_p for whether it is marked
or not, if not marked walk the DECL_VALUE_EXPR and then gt_ggc_mx mark
the VAR_DECL that was determined not marked and needs to be marked now.
One option would be to call gt_ggc_mx (t) right after the DECL_VALUE_EXPR
walking, but I'm a little bit worried that the subtree marking could mark
other VAR_DECLs (e.g. seen from DECL_SIZE or TREE_TYPE and the like) and
if they would be DECL_HAS_VALUE_EXPR_P we might not walk their
DECL_VALUE_EXPR anymore later.
So, the patch defers the gt_ggc_mx calls until we've walked all the
DECL_VALUE_EXPRs directly or indirectly connected to already marked
VAR_DECLs.

2025-02-14  Jakub Jelinek  <jakub@redhat.com>

PR debug/118790
* tree.cc (struct gt_value_expr_mark_data): New type.
(gt_value_expr_mark_2): Don't call ggc_set_mark, instead check
ggc_marked_p.  Treat data as gt_value_expr_mark_data * with pset
in it rather than address of the pset itself and push to be marked
VAR_DECLs into to_mark vec.
(gt_value_expr_mark_1): Change argument from hash_set<tree> *
to gt_value_expr_mark_data * and find pset in it.
(gt_value_expr_mark): Pass to traverse_noresize address of
gt_value_expr_mark_data object rather than hash_table<tree> and
for all entries in the to_mark vector after the traversal call
gt_ggc_mx.

LoongArch: Adjust the cost of ADDRESS_REG_REG.

After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.

Add option '-maddr-reg-reg-cost='.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Add
option '-maddr-reg-reg-cost='.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize
addr_reg_reg_cost to 3.
* config/loongarch/loongarch-opts.cc
(loongarch_target_option_override): If '-maddr-reg-reg-cost='
is not used, set it to the initial value.
* config/loongarch/loongarch-tune.h
(struct loongarch_rtx_cost_data): Add the member
addr_reg_reg_cost and its assignment function to the structure
loongarch_rtx_cost_data.
* config/loongarch/loongarch.cc (loongarch_address_insns):
Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* doc/invoke.texi: Add description of '-maddr-reg-reg-cost='.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/const-double-zero-stx.c: Add
'-maddr-reg-reg-cost=1'.
* gcc.target/loongarch/stack-check-alloca-1.c: Likewise.

LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843].

PR target/118843

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc
(loongarch_update_cpp_builtins): Fix macro definition issues.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118843.c: New test.

LoongArch: After setting the compilation options, update the predefined macros.

PR target/118828

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse):
Update the predefined macros.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118828.c: New test.
* gcc.target/loongarch/pr118828-2.c: New test.
* gcc.target/loongarch/pr118828-3.c: New test.
* gcc.target/loongarch/pr118828-4.c: New test.

LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

Split the implementation of the function loongarch_cpu_cpp_builtins into two parts:
1. Macro definitions that do not change (only considering 64-bit architecture)
2. Macro definitions that change with different compilation options.

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (builtin_undef): New macro.
(loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins
and loongarch_define_unconditional_macros.
(loongarch_def_or_undef): New functions.
(loongarch_define_unconditional_macros): Likewise.
(loongarch_update_cpp_builtins): Likewise.

LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

gcc/ChangeLog:

* config/loongarch/loongarch-target-attr.cc
(loongarch_pragma_target_parse): Move to ...
(loongarch_register_pragmas): Move to ...
* config/loongarch/loongarch-c.cc
(loongarch_pragma_target_parse): ... here.
(loongarch_register_pragmas): ... here.
* config/loongarch/loongarch-protos.h
(loongarch_process_target_attr): Function Declaration.

tree-optimization/90579 - avoid STLF fail by better optimizing

For the testcase in question which uses a fold-left vectorized
reduction of a reverse iterating loop we'd need two forwprop
invocations to first bypass the permute emitted for the reverse
iterating loop and then to decompose the vector load that only
feeds element extracts. The following moves the first transform
to a match.pd pattern and makes sure we fold the element extracts
when the vectorizer emits them so the single forwprop pass can
then pick up the vector load decomposition, avoiding the forwarding
fail that causes.

Moving simplify_bitfield_ref also makes forwprop remove the dead
VEC_PERM_EXPR via the simple-dce it uses - this was also
previously missing.

PR tree-optimization/90579
* tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
match.pd.
(pass_forwprop::execute): Adjust.
* match.pd (bit_field_ref (vec_perm ...)): New pattern
modeled after simplify_bitfield_ref.
* tree-vect-loop.cc (vect_expand_fold_left): Fold the
element extract stmt, combining it with the vector def.

* gcc.target/i386/pr90579.c: New testcase.

c++: Clear lambda scope for unattached member template lambdas

In r15-7202 we made lambdas between a template parameter scope and a
class/function/initializer be considered TU-local, in lieu of working
out how to mangle them to the succeeding declaration.

I neglected to clear any existing mangling on the template declaration
however; this means that such lambdas can occasionally get a lambda
scope, and will in general inherit the lambda scope of their
instantiation context (whatever that might be).

This patch ensures that the scope is cleared on the template declaration
as well.

gcc/cp/ChangeLog:

* lambda.cc (record_lambda_scope): Clear mangling scope for
otherwise unattached lambdas in class member templates.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval22.C: Add check that the primary
specialisation of the lambda is TU-local.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Fix mangling of lambas in static member template initializers [PR107741]

My fix for this issue in r15-7147 turns out to not be quite sufficient;
static member templates apparently go down a different code path and
need their own handling.

PR c++/107741

gcc/cp/ChangeLog:

* cp-tree.h (is_static_data_member_initialized_in_class):
Declare new predicate.
* decl2.cc (start_initialized_static_member): Push the
TEMPLATE_DECL when appropriate.
(is_static_data_member_initialized_in_class): New predicate.
(finish_initialized_static_member): Use it.
* lambda.cc (record_lambda_scope): Likewise.
* parser.cc (cp_parser_init_declarator): Start the member decl
early for static members so that lambda scope is set.
(cp_parser_template_declaration_after_parameters): Don't
register in-class initialized static members here.

gcc/testsuite/ChangeLog:

* g++.dg/abi/lambda-ctx2-19.C: Add tests for template members.
* g++.dg/abi/lambda-ctx2-19vs20.C: Likewise.
* g++.dg/abi/lambda-ctx2-20.C: Likewise.
* g++.dg/abi/lambda-ctx2.h: Likewise.
* g++.dg/cpp0x/static-member-init-1.C: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

RISC-V: Avoid more unsplit insns in const expander [PR118832].

Hi,

in PR118832 we have another instance of the problem already noticed in
PR117878.  We sometimes use e.g. expand_simple_binop for vector
operations like shift or and.  While this is usually OK, it causes
problems when doing it late, e.g. during LRA.

In particular, we might rematerialize a const_vector during LRA, which
then leaves an insn laying around that cannot be split any more if it
requires a pseudo.  Therefore we should only use the split variants
in expand_const_vector.

This patch fixed the issue in the PR and also pre-emptively rewrites two
other spots that might be prone to the same issue.

Regtested on rv64gcv_zvl512b.  As the two other cases don't have a test
(so might not even trigger) I unconditionally enabled them for my testsuite
run.

Regards
Robin

PR target/118832

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector):  Expand as
vlmax insn during lra.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118832.c: New test.

jit: add "final override" to diagnostic sink [PR116613]

I added class jit_diagnostic_listener in r15-4760-g0b73e9382ab51c
but forgot to annotate one of the vfuncs with "override".

Fixed thusly.

gcc/jit/ChangeLog:
PR other/116613
* dummy-frontend.cc
(jit_diagnostic_listener::on_report_diagnostic): Add
"final override".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

driver: -fhardened and -z lazy/-z norelro [PR117739]

As the manual states, using "-fhardened -fstack-protector" will produce
a warning because -fhardened wants to enable -fstack-protector-strong,
but it can't since it's been overriden by the weaker -fstack-protector.

-fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
produce the same warning.  But we don't detect this combination, so
this patch fixes it.  I also renamed a variable to better reflect its
purpose.

Also don't check warn_hardened in process_command, since it's always
true there.

Also tweak wording in the manual as Jon Wakely suggested on IRC.

PR driver/117739

gcc/ChangeLog:

* doc/invoke.texi: Tweak wording for -Whardened.
* gcc.cc (driver_handle_option): If -z lazy or -z norelro was
specified, don't enable linker hardening.
(process_command): Don't check warn_hardened.

gcc/testsuite/ChangeLog:

* c-c++-common/fhardened-16.c: New test.
* c-c++-common/fhardened-17.c: New test.
* c-c++-common/fhardened-18.c: New test.
* c-c++-common/fhardened-19.c: New test.
* c-c++-common/fhardened-20.c: New test.
* c-c++-common/fhardened-21.c: New test.

Reviewed-by: Jakub Jelinek <jakub@redhat.com>

testsuite: adjust nontype-class72 for implicit constexpr

This test added by r15-7507 doesn't get some expected diagnostics if we
implicitly make I(E) constexpr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class72.C: Disable -fimplicit-constexpr.

dwarf: emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack [PR70536]

Per https://wiki.dwarfstd.org/C++0x_Variadic_templates.md
DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name:

17$:      DW_TAG_formal_parameter_pack
              DW_AT_name("args")
18$:          DW_TAG_formal_parameter
                  ! no DW_AT_name attribute
                  DW_AT_type(reference to 13$)
(...)

PR c++/70536

gcc/ChangeLog:

* dwarf2out.cc (gen_formal_parameter_pack_die): Add name attr.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/template-func-params-7.C: Check for pack names.

Co-authored-by: Jason Merrill <jason@redhat.com>

c++: use -Wprio-ctor-dtor for attribute init_priority

gcc/cp/ChangeLog:

* tree.cc (handle_init_priority_attribute): Use OPT_prio_ctor_dtor.

gcc/testsuite/ChangeLog:

* g++.dg/special/initp1.C: Test disabling -Wprio-ctor-dtor.

c++: omp declare variant tweak

In r15-6707 I changed this function to use build_stub_object to more simply
produce the right type, but it occurs to me that forward_parm would be even
better, specifically for the diagnostic.

This changes nothing with respect to PR118791.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Use forward_parm.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-3.C: Adjust diagnostic.
* g++.dg/gomp/declare-variant-5.C: Adjust diagnostic.

Fix LAPACK build error due to global symbol checking.

This was an interesting regression.  It came from my recent
patch, where an assert was triggered because a procedure artificial
dummy argument generated for a global symbol did not have the
information if if was a function or a subroutine.  Fixed by
adding the information in gfc_get_formal_from_actual_arglist.

This information then uncovered some new errors, also in the
testsuite, which needed fixing.  Finally, the error is made to
look a bit nicer, so the user gets a pointer to where the
original interface comes from.

gcc/fortran/ChangeLog:

PR fortran/118845
* interface.cc (compare_parameter): If the formal attribute has been
generated from an actual argument list, also output an pointer to
there in case of an error.
(gfc_get_formal_from_actual_arglist): Set function and subroutine
attributes and (if it is a function) the typespec from the actual
argument.

gcc/testsuite/ChangeLog:

PR fortran/118845
* gfortran.dg/recursive_check_4.f03: Adjust call so types matche.
* gfortran.dg/recursive_check_6.f03: Likewise.
* gfortran.dg/specifics_2.f90: Adjust calls so types match.
* gfortran.dg/interface_52.f90: New test.
* gfortran.dg/interface_53.f90: New test.

c++: -frange-for-ext-temps and reused temps [PR118856]

Some things in the front-end use a TARGET_EXPR to create a temporary, then
refer to its TARGET_EXPR_SLOT separately later; in this testcase,
maybe_init_list_as_range does. So we need to handle that pattern in
extend_all_temps.

PR c++/118856

gcc/cp/ChangeLog:

* call.cc (struct extend_temps_data): Add var_map.
(extend_all_temps): Adjust.
(set_up_extended_ref_temp): Make walk_data void*.
(extend_temps_r): Remap variables. Handle pset here.
Extend all TARGET_EXPRs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/range-for9.C: New test.

c++: P2308, Template parameter initialization (tests) [PR113800]

This proposal was implemented a long time ago by my r9-5271,
but it took me this long to verify that it still works as per P2308.

This patch adds assorted tests, both from clang and from [temp.arg.nontype].
Fortunately I did not discover any issues in the compiler.

PR c++/113800
DR 2450

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing15.C: New test.
* g++.dg/cpp2a/nontype-class68.C: New test.
* g++.dg/cpp2a/nontype-class69.C: New test.
* g++.dg/cpp2a/nontype-class70.C: New test.
* g++.dg/cpp2a/nontype-class71.C: New test.
* g++.dg/cpp2a/nontype-class72.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]

The following testcase ICEs, because we have multiple levels of
DECL_VALUE_EXPR VAR_DECLs:
  character(kind=1) id_string[1:.id_string] [value-expr: *id_string.55];
  character(kind=1)[1:.id_string] * id_string.55 [value-expr: FRAME.107.id_string.55];
  integer(kind=8) .id_string [value-expr: FRAME.107..id_string];
id_string is the user variable mentioned in BLOCK_VARS, it has
DECL_VALUE_EXPR because it is a VLA, id_string.55 is a temporary created by
gimplify_vla_decl as the address that points to the start of the VLA, what
is normally used in the IL to access it.  But as this artificial var is then
used inside of a nested function, tree-nested.cc adds DECL_VALUE_EXPR to it
too and moves the actual value into the FRAME.107 object's member.
Now, remove_unused_locals removes id_string.55 (and various other VAR_DECLs)
from cfun->local_decls, simply because it is not mentioned in the IL at all
(neither is id_string itself, but that is kept in BLOCK_VARS as it has
DECL_VALUE_EXPR).  So, after this point, id_string.55 tree isn't referenced from
anywhere but id_string's DECL_VALUE_EXPR.  Next GC collection is triggered,
and we are unlucky enough that in the value_expr_for_decl hash table
(underlying hash map for DECL_VALUE_EXPR) the id_string.55 entry comes
before the id_string entry.  id_string is ggc_marked_p because it is
referenced from BLOCK_VARS, but id_string.55 is not, as we don't mark
DECL_VALUE_EXPR anywhere but by gt_cleare_cache on value_expr_for_decl.
But gt_cleare_cache does two things, it calls clear_slots on entries
where the key is not ggc_marked_p (so the id_string.55 mapping to
FRAME.107.id_string.55 is lost and DECL_VALUE_EXPR (id_string.55) becomes
NULL) but then later we see id_string entry, which is ggc_marked_p, so mark
the whole hash table entry, which sets ggc_set_mark on id_string.55.  But
at this point its DECL_VALUE_EXPR is lost.
Later during dwarf2out.cc we want to emit DW_AT_location for id_string, see
it has DECL_VALUE_EXPR, so emit it as indirection of id_string.55 for which
we again lookup DECL_VALUE_EXPR as it has DECL_HAS_VALUE_EXPR_P, but as it
is NULL, we ICE, instead of finding it is a subobject of FRAME.107 for which
we can find its stack location.

Now, as can be seen in the PR, I've tried to tweak tree-ssa-live.cc so that
it would keep id_string.55 in cfun->local_decls; that prohibits it from
the DECL_VALUE_EXPR of it being GC until expansion, but then we shrink and
free cfun->local_decls completely and so GC at that point still can throw
it away.

The following patch adds an extension to the GTY ((cache)) option, before
calling the gt_cleare_cache on some hash table by specifying
GTY ((cache ("somefn"))) it calls somefn on that hash table as well.
And this extra hook can do any additional ggc_set_mark needed so that
gt_cleare_cache preserves everything that is actually needed and throws
away the rest.

In order to make it just 2 pass rather than up to n passes - (if we had
say
id1 -> something, id2 -> x(id1), id3 -> x(id2), id4 -> x(id3), id5 -> x(id4)
in the value_expr_for_decl hash table in that order (where idN are VAR_DECLs
with DECL_HAS_VALUE_EXPR_P, id5 is the only one mentioned from outside and
idN -> X stands for idN having DECL_VALUE_EXPR X, something for some
arbitrary tree and x(idN) for some arbitrary tree which mentions idN
variable) and in each pass just marked the to part of entries with
ggc_marked_p base.from we'd need to repeat until we don't mark anything)
the patch calls walk_tree on DECL_VALUE_EXPR of the marked trees and if it
finds yet unmarked tree, it marks it and walks its DECL_VALUE_EXPR as well
the same way.

2025-02-13  Jakub Jelinek  <jakub@redhat.com>

PR debug/118790
* gengtype.cc (write_roots): Remove cache variable, instead break from
the loop on match and test o for NULL.  If the cache option has
non-empty string argument, call the specified function with v->name
as argument before calling gt_cleare_cache on it.
* tree.cc (gt_value_expr_mark_2, gt_value_expr_mark_1,
gt_value_expr_mark): New functions.
(value_expr_for_decl): Use GTY ((cache ("gt_value_expr_mark"))) rather
than just GTY ((cache)).
* doc/gty.texi (cache): Document optional argument of cache option.

* gfortran.dg/gomp/pr118790.f90: New test.

arm: gimple fold aes[ed] [PR114522]

Almost a copy/paste from the recent aarch64 version of this patch,
this one is a bit more intrusive because it also introduces
arm_general_gimple_fold_builtin.

With this patch,
gcc.target/arm/aes_xor_combine.c scan-assembler-not veor
passes again.

gcc/ChangeLog:

PR target/114522
* config/arm/arm-builtins.cc (arm_fold_aes_op): New function.
(arm_general_gimple_fold_builtin): New function.
* config/arm/arm-builtins.h (arm_general_gimple_fold_builtin): New
prototype.
* config/arm/arm.cc (arm_gimple_fold_builtin): Call
arm_general_gimple_fold_builtin as needed.

c++: Constrain visibility for CNTTPs with internal types [PR118849]

While looking into PR118846 I noticed that we don't currently constrain
the linkage of functions involving CNTTPs of internal-linkage types. It
seems to me that this would be sensible to do.

PR c++/118849

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r): Constrain visibility according to
the type of decl_constant_var_p decls.

gcc/testsuite/ChangeLog:

* g++.dg/template/linkage6.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

testsuite: Add another range for coroutines testcase [PR118574]

This patch adds another range for coroutine testcase, which doesn't
extend (across co_await) just the __for_range var and what it binds
to (so passes even without -frange-for-ext-temps), but also some other
temporaries and verifies they are destructed in the right order.

2025-02-13 Jakub Jelinek <jakub@redhat.com>

PR c++/118574
* g++.dg/coroutines/range-for2.C: New test.

c++: Fix up regressions caused by for/while loops with declarations [PR118822]

The recent PR86769 r15-7426 changes regressed the following two testcases,
the first one is more important as it is derived from real-world code.

The first problem is that the chosen
prep = do_pushlevel (sk_block);
// emit something
body = push_stmt_list ();
// emit further stuff
body = pop_stmt_list (body);
prep = do_poplevel (prep);
way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY
isn't reliable.  If during parsing a label is seen in the body and then
some decl with destructors, sk_cleanup transparent scope is added, but
the correspondiong result from push_stmt_list is saved in
*current_binding_level and pop_stmt_list then pops even that statement list
but only do_poplevel actually attempts to pop the sk_cleanup scope and so we
ICE.
The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block);
is that variables should be in the same scope (otherwise various e.g.
redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel
(sk_cleanup); wouldn't work either as do_poplevel would silently unwind even
the cleanup one.

The second problem is that my assumption that the declaration in the
condition will have zero or one cleanup is just wrong, at least for
structured bindings used as condition, there can be as many cleanups as
there are names in the binding + 1.

So, the following patch changes the earlier approach.  Nothing is removed
from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond,
push_stmt_list isn't called either; all it does is remember as an integer
the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs)
from querying stmt_list_stack and finding the initial *body_p in there
(that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily
{FOR,WHILE}_BODY is set to the last statement (if any) in the innermost
STATEMENT_LIST at the adjust_loop_decl_cond time; then at
finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of
do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds
given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right
spot where body statements start and moves that into {FOR,WHILE}_BODY.
Finally genericize_c_loop then inserts the cond, body, continue label, expr
into the right subtree of {FOR,WHILE}_COND_PREP.
The constexpr evaluation unfortunately had to be changed as well, because
we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ())
right away, we want to evaluate it with the exception of the CLEANUP_STMT
cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer
the evaluation of the cleanups until after cond, body, expr are evaluated.

2025-02-13  Jakub Jelinek  <jakub@redhat.com>

PR c++/118822
PR c++/118833
gcc/
* tree-iterator.h (tsi_split_stmt_list): Declare.
* tree-iterator.cc (tsi_split_stmt_list): New function.
gcc/c-family/
* c-common.h (WHILE_COND_CLEANUP): Change description in comment.
(FOR_COND_CLEANUP): Likewise.
* c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP
being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth
instead of actual cleanup.
gcc/cp/
* semantics.cc (adjust_loop_decl_cond): Allow multiple trailing
CLEANUP_STMT levels in *BODY_P.  Set *CLEANUP_P to the number
of levels rather than one particular cleanup, keep the cleanups
in *PREP_P.  Set *BODY_P to the last stmt in the cur_stmt_list
or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty.
(finish_loop_cond_prep): New function.
(finish_while_stmt, finish_for_stmt): Use it.  Don't call
set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Adjust handling of
{FOR,WHILE}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/expr/for9.C: New test.
* g++.dg/cpp26/decomp12.C: New test.

build: Remove HAVE_LD_EH_FRAME_CIEV3

Old versions of Solaris ld and GNU ld didn't support CIEv3 in .eh_frame.
To avoid this breaking the build

[build] Default to DWARF 4 on Solaris if linker supports CIEv3
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00669.html

checked for the necessary linker support, defaulting to DWARF-2 if
necessary. Solaris ld was fixed in Solaris 11.1, GNU ld in binutils
2.16, so this is long obsolete and only used in Solaris code anyway.

This patch thus removes both the configure check and
solaris_override_options.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

2025-02-12 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* configure.ac (gcc_cv_ld_eh_frame_ciev3): Remove.
* configure, config.in: Regenerate.
* config/sol2.cc (solaris_override_options): Remove.
* config/sol2.h (SUBTARGET_OVERRIDE_OPTIONS): Remove.
* config/sol2-protos.h (solaris_override_options): Remove.

doc: Update install.texi for GCC 15 on Solaris

Apart from minor updates, this patch is primarily an important caveat
about binutils PR ld/32580, which has broken the binutils 2.44 ld on
Solaris/x86.

Tested on i386-pc-solaris2.11.

2025-02-11 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* doc/install.texi (Specific, *-*-solaris2*): Updates for newer
Solaris 11.4 SRUs and binutils 2.44.

s390: Fix s390_valid_shift_count() for TI mode [PR118835]

During combine we may end up with

(set (reg:DI 66 [ _6 ])
     (ashift:DI (reg:DI 72 [ x ])
                (subreg:QI (and:TI (reg:TI 67 [ _1 ])
                                   (const_wide_int 0x0aaaaaaaaaaaaaabf))
                           15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

Thus, be more strict here and accept only CONST_INT operands.  Done by
replacing immediate_operand() with const_int_operand() which is enough
since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P
and targetm.legitimate_constant_p which are always true for CONST_INT
operands.

While on it, fix indentation of the if block.

gcc/ChangeLog:

PR target/118835
* config/s390/s390.cc (s390_valid_shift_count): Reject shift
count operands which do not trivially fit the scheme of
address operands.

gcc/testsuite/ChangeLog:

* gcc.target/s390/pr118835.c: New test.

tree-optimization/118817 - fix ICE with VN CTOR simplification

The representation of CONSTRUCTOR nodes in VN NARY and gimple_match_op
do not agree so do not attempt to marshal between them.

PR tree-optimization/118817
* tree-ssa-sccvn.cc (vn_nary_simplify): Do not process
CONSTRUCTOR NARY or update from CONSTRUCTOR simplified
gimple_match_op.

* gcc.dg/pr118817.c: New testcase.

Daily bump.

loop-invariant: Treat inline-asm conditional trapping [PR102150]

So inline-asm is known not to trap BUT it can have undefined behavior
if made executed speculatively. This fixes the loop invariant pass to
treat it similarly as trapping cases. If the inline-asm could be executed
always, then it will be pulled out of the loop; otherwise it will be kept
inside the loop.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR rtl-optimization/102150
* loop-invariant.cc (find_invariant_insn): Treat inline-asm similar to
trapping instruction and only move them if always executed.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ifcvt: Don't speculation move inline-asm [PR102150]

So unlike loop invariant motion, moving an inline-asm out of an
if is not always profitable and the cost estimate for the instruction
inside inline-asm is unknown.

This is a regression from GCC 4.6 which didn't speculatively move inline-asm
as far as I can tell.
Bootstrapped and tested on x86_64-linux-gnu.

PR rtl-optimization/102150
gcc/ChangeLog:

* ifcvt.cc (cheap_bb_rtx_cost_p): Return false if the insn
has an inline-asm in it.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

avr.opt.urls += -mcall-main

gcc/
* config/avr/avr.opt.urls: Add -mcall-main.

AVR: target/118806 - Add -mno-call-main to tweak running main().

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

   XCALL main
   XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

gcc/
PR target/118806
* config/avr/avr.opt (-mcall-main): New option and...
(avropt_call_main): ...variable.
* config/avr/avr.cc (avr_no_call_main_p): New variable.
(avr_insert_attributes) [-mno-call-main, main]: Add attributes
noreturn and section(".init9") to main.  Set avr_no_call_main_p.
(avr_file_end) [avr_no_call_main_p]: Define symbol __call_main.
* doc/invoke.texi (AVR Options) <-mno-call-main>: Document.
<-mnodevicelib>: Extend explanation.

c++: add fixed test [PR101740]

Fixed by r12-3643.

PR c++/101740

gcc/testsuite/ChangeLog:

* g++.dg/template/dtor12.C: New test.

vect: Set counts of early break exit blocks correctly [PR117790]

This adds missing code to correctly set the counts of the exit blocks we
create when building the CFG for a vectorized early break loop.

gcc/ChangeLog:

PR tree-optimization/117790
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Set profile counts for {main,alt}_loop_exit_block.

RISC-V: Vector pesudoinsns with x0 operand to use imm 0

A couple of Vector pseudoinstructions use x0 scalar which could be
inefficient on wider uarches due to regfile crossing.

Instead use the imm 0 form, which should be functionally equivalent.

pseudoinsn            orig insn with x0     this patch
--------------------  --------------------  -------------------
vneg.v vd,vs          vrsub.vx vd,vs,x0     vrsub.vi vd,vs,0
vncvt.x.x.w vd,vs,vm  vnsrl.wx vd,vs,x0,vm  vnsrl.wi vd,vs,0,vm
vwcvt.x.x.v vd,vs,vm  vwadd.vx vd,vs,x0,vm  (imm not supported)

gcc/ChangeLog:
* config/riscv/vector.md: vncvt substitute vnsrl.
vnsrl with x0 replace with immediate 0.
vneg substitute vrsub.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change
expected pattern.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.

https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html

PR target/118601

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.

Reported-by: Edwin Lu <ewlu@rivosinc.com>

Doc: Fix Texinfo warning in install.texi

For some time I've been seeing this Texinfo warning in my builds:

.../gcc/doc/install.texi:2295: warning: `.' or `,' must follow @xref, not f

Fixed thusly.

gcc/ChangeLog
* doc/install.texi: Add missing comma after @xref to fix warning.

Doc: Fix some typos and other nearby sloppy-writing issues

I spotted some typos in the GCC manual. Since often these are a sign
that the text was inserted without being proofread, I looked at the
context and fixed some grammar/punctuation/wording issues as well.

gcc/ChangeLog
* doc/extend.texi: Fix a bunch of typos and other writing bugs.
* doc/invoke.texi: Likewise.

Doc: Delete obsolete interface.texi chapter from GCC internals manual

The "Interfacing to GCC Output" chapter used to be part of the
user-facing GCC documentation but ended up in the GCC internals manual
when the two documents were separated in 2001.  It hasn't been updated
in any substantive way since then, and is now very bit-rotten.  (PCC is
no longer the "standard compiler" on any target, and the target-specific
issues mentioned are for very old architectures.)

Meanwhile, the GCC user documentation now has a chapter called "Binary
Compatibility" that covers ABI issues in a generic way and also covers
C++ compatibility.  Let's keep that one and throw out the obsolete
text that seems to predate the whole notion of an ABI.

gcc/ChangeLog
* Makefile.in (TEXI_GCCINT_FILES): Remove interface.texi.
* doc/gccint.texi (Top): Remove menu entry for the "interface" node,
and include of interface.texi.
* doc/interface.texi: Delete.

RISC-V: Drop __riscv_vendor_feature_bits

As discussed from RISC-V C-API PR #101 [1], As discussed in #96, current
interface is insufficient to support some cases, like a vendor buying a
CPU IP from the upstream vendor but using their own mvendorid and custom
features from the upstream vendor. In this case, we might need to add
these extensions for each downstream vendor many times. Thus, making
__riscv_vendor_feature_bits guarded by mvendorid is not a good idea. So,
drop __riscv_vendor_feature_bits for now, and we should have time to
discuss a better solution.

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/101

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv-feature-bits.h (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop.
(struct riscv_vendor_feature_bits): Drop.

libgcc/ChangeLog:

* config/riscv/feature_bits.c (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop.
(__init_riscv_features_bits_linux): Drop.

Daily bump.

[PR target/115478] Accept ADD, IOR or XOR when combining objects with no bits in common

So the change to prefer ADD over IOR for combining two objects with no bits in
common is (IMHO) generally good. It has some minor fallout.

In particular the aarch64 port (and I suspect others) have patterns that
recognize IOR, but not PLUS or XOR for these cases and thus tests which
expected to optimize with IOR are no longer optimizing.

Roger suggested using a code iterator for this purpose. Richard S. suggested a
new match operator to cover those cases.

I really like the match operator idea, but as Richard S. notes in the PR it
would require either not validating the "no bits in common", which dramatically
reduces the utility IMHO or we'd need some work to allow consistent results
without polluting the nonzero bits cache.

So this patch goes back to Roger's idea of just using a match iterator in the
aarch64 backend (and presumably anywhere else we see this popping up).

Bootstrapped and regression tested on aarch64-linux-gnu where it fixes
bitint-args.c (as expected).

PR target/115478
gcc/
* config/aarch64/iterators.md (any_or_plus): New code iterator.
* config/aarch64/aarch64.md (extr<mode>5_insn): Use any_or_plus.
(extr<mode>5_insn_alt, extrsi5_insn_uxtw): Likewise.
(extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise.

gcc/testsuite/
* gcc.target/aarch64/bitint-args.c: Update expected output.

c++: don't default -frange-for-ext-temps in -std=gnu++20 [PR188574]

Since -frange-for-ext-temps has been causing trouble, let's not enable it
by default in pre-C++23 GNU modes for GCC 15, and also allow disabling it in
C++23 and up.

PR c++/188574

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Only enable
-frange-for-ext-temps by default in C++23.

gcc/ChangeLog:

* doc/invoke.texi: Adjust -frange-for-ext-temps documentation.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/range-for3.C: Use -frange-for-ext-temps.
* g++.dg/cpp23/range-for4.C: Adjust expected result.

libgomp/ChangeLog:

* testsuite/libgomp.c++/range-for-4.C: Adjust expected result.

c++: change implementation of -frange-for-ext-temps [PR118574]

The implementation in r15-3840 used a novel technique of wrapping the entire
range-for loop in a CLEANUP_POINT_EXPR, which confused the coroutines
transformation. Instead let's use the existing extend_ref_init_temps
mechanism.

This does not revert all of r15-3840, only the parts that change how
CLEANUP_POINT_EXPRs are applied to range-for declarations.

PR c++/118574
PR c++/107637

gcc/cp/ChangeLog:

* call.cc (struct extend_temps_data): New.
(extend_temps_r, extend_all_temps): New.
(set_up_extended_ref_temp): Handle tree walk case.
(extend_ref_init_temps): Cal extend_all_temps.
* decl.cc (initialize_local_var): Revert ext-temps change.
* parser.cc (cp_convert_range_for): Likewise.
(cp_parser_omp_loop_nest): Likewise.
* pt.cc (tsubst_stmt): Likewise.
* semantics.cc (finish_for_stmt): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/range-for1.C: New test.

aarch64: Update fp8 dependencies

We agreed with LLVM developers to not enforce the architectural
dependencies between fp8 multiplication features, and they have already
been removed from LLVM and Binutils. Remove them from GCC as well.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def
(SSVE_FP8FMA): Adjust formatting.
(FP8DOT4): Replace FP8FMA dependency with FP8.
(SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8.
(FP8DOT2): Replace FP8DOT4 dependency with FP8.
(SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected
defines.
* gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target
pragmas.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c:
Ditto.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.

testsuite: Enable reduced parallel batch sizes

Various aarch64 tests attempt to reduce the batch size for parallel test
execution to a single test per batch, but it looks like the necessary
changes to gcc_parallel_test_run_p were accidentally omitted when the
aarch64-*-acle-asm.exp files were merged.  This patch corrects that
omission.

This does have a measurable performance impact when running a limited
number of tests.  For example, in aarch64-sve-acle-asm.exp the use of
torture options results in 16 compiler executions for each test; when
running two such tests I observed a total test duration of 3m39 without
this patch, and 1m55 with the patch.  A full batch of 10 tests would
have taken over 15 minutes to run on this machine.

gcc/testsuite/ChangeLog:

* lib/gcc-defs.exp
(gcc_runtest_parallelize_limit_minor): New global variable.
(gcc_parallel_test_run_p): Use new variable for batch size.

OpenMP: Pass a 3-way flag to omp_check_context_selector instead of a bool.

The OpenMP "begin declare variant" directive has slightly different
requirements for context selectors than regular "declare variant", so
something more than a bool is required to tell the error-checking routine
what to check.

gcc/ChangeLog
* omp-general.cc (omp_check_context_selector): Change
metadirective_p argument to a 3-way flag. Add extra check for
OMP_CTX_BEGIN_DECLARE_VARIANT.
* omp-general.h (enum omp_ctx_directive): New.
(omp_check_context_selector): Adjust declaration.

gcc/c/ChangeLog
* c-parser.cc (c_finish_omp_declare_variant): Update call to
omp_check_context_selector.
(c_parser_omp_metadirective): Likewise.

gcc/cp/ChangeLog
* parser.cc (cp_finish_omp_declare_variant): Update call to
omp_check_context_selector.
(cp_parser_omp_metadirective): Likewise.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant): Update call to
omp_check_context_selector.
(gfc_trans_omp_metadirective): Likewise.

OpenMP: Bug fixes for comparing context selectors

gcc/ChangeLog
* omp-general.cc (omp_context_selector_props_compare): Handle
arbitrary expressions in the "user" and "device_num" selectors.
(omp_context_selector_set_compare): Detect mismatch when one
selector specifies a score and the other doesn't.

lto: Add an entry for cold attribute to lto_gnu_attributes

PR 118125 is a performance regression stemming from the fact that we
lose the cold attribute of our __builtin_unreachable.  The attribute
is simply and silently dropped on the floor by decl_attributes (in
attribs.cc) in the process of building decls for builtins because it
cannot look it up in the gnu attribute name space by
lookup_scoped_attribute_spec.  For that not to happen it must be in
lto_gnu_attributes and this patch adds it there.

In comment 13 of the bug Andrew identified other attributes which are
in builtin-attrs.def but missing in lto_gnu_attributes but apart from
cold it seems that they are either not used in builtins.def or are
used in DEF_LIB_BUILTIN which I guess might be less critical?
Eventually I decided to go for the most simple of patches and only add
things if they are requested.  For the same reason I also did not add
any checking to the attribute "handle" callback or any exclusion check.
They seem to be mostly relevant before LTO FE kicks in to me, but
again, I'm happy to add any if they seem to be useful.

Since Ian fixed PR 118746, the same issue has also been fixed in the
Go front-end and so I have added a simple checking assert to the
redirect_to_unreachable function to make sure it has the intended
effect.

gcc/ChangeLog:

2025-02-03  Martin Jambor  <mjambor@suse.cz>

PR lto/118125
* ipa-fnsummary.cc (redirect_to_unreachable): Add checking assert
that the builtin_unreachable decl has attribute cold.

gcc/lto/ChangeLog:

2025-02-03  Martin Jambor  <mjambor@suse.cz>

PR lto/118125
* lto-lang.cc (lto_gnu_attributes): Add an entry for cold attribute.
(handle_cold_attribute): New function.

c++: Reject cdtors and conversion operators with a single * as return type [PR118304, PR118306]

We currently accept the following constructor declaration (clang, EDG
and MSVC do as well), and ICE on the destructor declaration

=== cut here ===
struct A {
*A ();
~A () = default;
};
=== cut here ===

The problem is that we end up in grokdeclarator with a cp_declarator of
kind cdk_pointer but no type, and we happily go through (if we have a
reference instead we eventually error out trying to form a reference to
void).

This patch makes sure that grokdeclarator errors out and strips the
invalid declarator when processing a cdtor (or a conversion operator
with no return type specified) with a declarator representing a pointer
or a reference type.

PR c++/118306
PR c++/118304

gcc/cp/ChangeLog:

* decl.cc (maybe_strip_indirect_ref): New.
(check_special_function_return_type): Take declarator as input.
Call maybe_strip_indirect_ref and error out if it returns true.
(grokdeclarator): Update call to
check_special_function_return_type.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.jason/operator.C: Adjust bogus test
expectation (char** vs char*).
* g++.dg/parse/constructor4.C: New test.
* g++.dg/parse/constructor5.C: New test.
* g++.dg/parse/conv_op2.C: New test.
* g++.dg/parse/default_to_int.C: New test.

sarif-replay: fix off-by-one in handling of "endColumn" (§3.30.8) [PR118792]

gcc/ChangeLog:
PR sarif-replay/118792
* libsarifreplay.cc (sarif_replayer::handle_region_object): Fix
off-by-one in handling of endColumn property so that the code
matches the comment and the SARIF spec (§3.30.8).

gcc/testsuite/ChangeLog:
PR sarif-replay/118792
* sarif-replay.dg/2.1.0-valid/error-with-note.sarif: Update
expected output to reflect fix to off-by-one error in handling of
"endColumn" property.
* sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Synchronize include/dwarf2.def with binutils

The contents of include/dwarf2.def have diverged between the gcc and
the binutils repositories.  Currently, it's impossible to build a combined
tree, as GCC won't build with the binutils version of dwarf2.def and binutils
won't build with the gcc version.  This patch realigns this file by copying
the defintion of DW_CFA_AARCH64_negate_ra_state_with_pc from binutils,
restoring the ability to build a combined source tree.

2025-02-11  Roger Sayle  <roger@nextmovesoftware.com>

include/ChangeLog
* dwarf2.def (DW_CFA_AARCH64_negate_ra_state_with_pc): Define.

tree-optimization/118817 - missed folding of PRE inserted code

When PRE inserts code it is not fully folded with following SSA
edges which can cause missed optimizations since the next fully
folding pass is way ahead, after strlen which in the PRs case leads
to diagnostics emitted on dead code.

The following mitigates the missed expression canonicalization that
happens during PHI translation where to be inserted expressions are
calculated. It is largely refactoring and eliminating the single
use of fully_constant_expression and otherwise leverages the
work already done by vn_nary_simplify by updating the NARY with
the simplified expression.

PR tree-optimization/118817
* tree-ssa-pre.cc (fully_constant_expression): Fold into
the single caller.
(phi_translate_1): Refactor folded in fully_constant_expression.
* tree-ssa-sccvn.cc (vn_nary_simplify): Update the NARY with
the simplified expression.

* g++.dg/lto/pr118817_0.C: New testcase.

testsuite: Fix g++.dg/modules/adl-5

This testcase wasn't running, because adl-5_a had the wrong extension.
adl-5_d should have been reporting an error because 'frob' is only
visible from within the 'hidden' module but this was missed.

gcc/testsuite/ChangeLog:

* g++.dg/modules/adl-5_a.c: Move to...
* g++.dg/modules/adl-5_a.C: ...here.
* g++.dg/modules/adl-5_d.C: Add errors.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Fix use-after-free of replaced friend instantiation [PR118807]

When instantiating a friend function, we call register_specialization
which adds it to the DECL_TEMPLATE_INSTANTIATIONS of the template.
However, in some circumstances we might immediately call pushdecl and
find an existing specialisation. In this case, when reregistering the
specialisation we also need to update the DECL_TEMPLATE_INSTANTIATIONS
list so that we don't try to access the freed spec again later.

PR c++/118807

gcc/cp/ChangeLog:

* pt.cc (reregister_specialization): Remove spec from
DECL_TEMPLATE_INSTANTIATIONS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118807.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

x86: Correct ASM_OUTPUT_SYMBOL_REF

x is not a macro argument. It just happens to work as final.cc passes
x for 2nd argument:

final.cc: ASM_OUTPUT_SYMBOL_REF (file, x);

PR target/118825
* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
SYM.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

config.gcc: Support mips*64*-linux-muslabi64 as ABI64 by default

LLVM introduced this triple support. Let's sync with it.

gcc
* config.gcc: Add mips*64*-linux-muslabi64 triple support.

MIPS: Add some floating point instructions support for MIPSr6

This patch adds some of the float point instructions from
MIPS32 Release 6(mips32r6) with their respective built-in
functions and tests:

    min_a_s, min_a_d
    max_a_s, max_a_d
    rint_s, rint_d
    class_s, class_d

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_minmax): Include
fclass type.
(i6400_fpu_fadd): Include frint type.
* config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
for __builtin_mipsr6_xxx.
(MIPSR6_BUILTIN_PURE): Same as above.
(CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
(CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
(CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
New code_aliasing macros.
(mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
max_a_d, class_s, class_d builtins.
* config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
(ISA_HAS_FCLASS): Same as above.
* config/mips/mips.md (UNSPEC_FRINT): New unspec.
(UNSPEC_FCLASS): Same as above.
(type): Add frint and fclass.
(fmin_a_<mode>): Generates MINA.fmt instructions.
(fmax_a_<mode>): Generates MAXA.fmt instructions.
(rint<mode>2): Generates RINT.fmt instructions.
(fclass_<mode>): Generates CLASS.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fadd): Include
frint type.
(p6600_fpu_fabs): Include fclass type.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-class.c: New tests for MIPSr6
* gcc.target/mips/mips-minamaxa.c: Same as above.
* gcc.target/mips/mips-rint.c: Same as above.

Signed-off-by: Jie Mei <jie.mei@oss.cipunited.com>
Co-authored-by: Xi Ruoyao <xry111@xry111.site>

libphobos: Disable libphobos.phobos/std/concurrency.d on macOS 13+ [PR111628]

The libphobos.phobos_shared/std/concurrency.d test just hangs on macOS
13 and beyond and isn't even termintated after the testsuite timeout is
exceeded.  Thus, more and more concurrency.exe processes keep
accumulating, consuming CPU time for nothing.

To avoid this, this patch skips the test on macOS 13+.  The static test
SEGVs immediately instead, but I'm skipping it too for symmetry.

Tested on macOS 15 (where it becomes UNSUPPORTED) and 12 (where it still
PASSes).

I have no idea what happens on Darwin/arm64, so currently the skipping
is restricted to Darwin/x86_64.

2025-02-10  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

PR d/111628
* testsuite/libphobos.phobos/phobos.exp (libphobos_skip_tests):
Add libphobos.phobos/std/concurrency.d on macOS 13+.
* testsuite/libphobos.phobos_shared/phobos_shared.exp
(libphobos_skip_tests): Likewise for
libphobos.phobos_shared/std/concurrency.d

testsuite: LoongArch: Remove from btrunc, ceil, and floor effective target allowlist

Now that C default is C23, so we can no longer use LSX/LASX instructions
for these operations as the standard disallows raising INEXACT
exceptions. So LoongArch is no longer suitable for these effective
targets.

Fix the test failures on gcc.dg/vect/vect-rounding-*.c. For the old
standards or -ffp-int-builtin-inexact we already provide test coverage
with gcc.target/loongarch/vect-ftint.c.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_btrunc): Drop LoongArch.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floor): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_lfloor): Likewise.
(check_effective_target_vect_call_lfloorf): Likewise.

i386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]

When moving intrins around for AVX10 implementation in GCC 14,
the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly
wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__",
leading to the intrin file not `-Wsystem-headers -Wundef` clean
since r14-4490.

gcc/ChangeLog:

PR target/118813
* config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
wrap.

PR modula2/118761: gm2 driver doesnt behave as gcc for -fhelp=BLA

This patch enables the gm2 driver to handle -fsyntax-only -fhelp=optimizers,
for example, correctly without terminating with gm2: fatal error:
no input files.

gcc/m2/ChangeLog:

PR modula2/118761
* gm2spec.cc (lang_specific_driver): Add case clauses for
OPT__help, OPT__help_ set in_added_libraries to 0 and early
return.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

libbacktrace: add cast to avoid undefined shift

Patch from pgerell@github.

* elf.c (elf_uncompress_lzma_block): Add casts to avoid
potentially shifting a value farther than its type size.

This improves an error message, avoiding at ... at.

gcc/fortran/ChangeLog:

PR fortran/24878
* interface.cc (compare_parameter): Better wording on
error message.

gcc/testsuite/ChangeLog:

PR fortran/24878
* gfortran.dg/interface_51.f90: Adjust expected error message.

Fortran: checking of pointer targets for structure constructors [PR56423]

Check the target of a pointer component in a structure constructor for same
ranks, and that the initial-data-target does not have vector subscripts.

PR fortran/56423

gcc/fortran/ChangeLog:

* resolve.cc (resolve_structure_cons): Check rank of pointer target;
reject pointer target with vector subscripts.

gcc/testsuite/ChangeLog:

* gfortran.dg/derived_constructor_comps_2.f90: Adjust test.
* gfortran.dg/derived_constructor_comps_8.f90: New test.

[gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has

Assume that a distro has configured, e.g., a gfx9-generic multilib but not
for gfx902. In that case, mkoffload would fail to link with "error:
incompatible mach". With this commit, an error is printed suggesting to try
the associated generic architecture instead. The behavior is unchanged if
there is a multilib available for the specific ISA or when there is also no
multilib for the generic ICA.

Note: The build of generic multilibs are currently not enabled by default;
they also require the linker/assembler of LLVM 19 or newer and, in particular,
for the execution a future ROCm release. (The next one? In any case, 6.3.2
does not support generic ISAs, yet.)

gcc/ChangeLog:

* config/gcn/mkoffload.cc (enum elf_arch_code): Add
EF_AMDGPU_MACH_AMDGCN_NONE.
(elf_arch): Use enum elf_arch_code as type.
(tool_cleanup): Silence warning by removing tailing '.' from error.
(get_arch_name): Return enum elf_arch_code.
(check_for_missing_lib): New; print fatal error if the multilib
is not available but it is for the associate generic ISA.
(main): Call it.

[gcn] install.texi: Update for new ISA targets and their requirements

GCN now supports several additional ISA targets such that no longer
all targets have a multilib by default; add a note about this, the
generic targets and the required LLVM (and ROCm) versions.

gcc/ChangeLog:

* doc/install.texi (GCN): Update section about multilibs and
required LLVM version.

ipa-cp: Perform operations in the appropriate types (PR 118097)

One of the testcases from PR 118097 and the one from PR 118535 show
that the fix to PR 118138 was incomplete.  We must not only make sure
that (intermediate) results of operations performed by IPA-CP are
fold_converted to the type of the destination formal parameter but we
also must decouple the these types from the ones in which operations
are performed.

This patch does that, even though we do not store or stream the
operation types, instead we simply limit ourselves to tcc_comparisons
and operations for which the first operand and the result are of the
same type as determined by expr_type_first_operand_type_p.  If we
wanted to go beyond these, we would indeed need to store/stream the
respective operation type.

ipa_value_from_jfunc needs an additional check that res_type is not
NULL because it is not called just from within IPA-CP (where we know
we have a destination lattice slot belonging to a defined parameter)
but also from inlining, ipa-fnsummary and ipa-modref where it is used
to examine a call to a function with variadic arguments and we do not
have types for the unknown parameters.  But we cannot really work with
those or estimate any benefits when it comes to them, so ignoring them
should be OK.

Even after this patch, ipa_get_jf_arith_result has a parameter called
res_type in which it performs operations for aggregate jump functions,
where we do not allow type conversions when constucting the jump
functions and the type is the type of the stored data.  In GCC 16, we
could relax this and allow conversions like for scalars.

gcc/ChangeLog:

2025-01-20  Martin Jambor  <mjambor@suse.cz>

PR ipa/118097
* ipa-cp.cc (ipa_get_jf_arith_result): Adjust comment.
(ipa_get_jf_pass_through_result): Removed.
(ipa_value_from_jfunc): Use directly ipa_get_jf_arith_result, do
not specify operation type but make sure we check and possibly
convert the result.
(get_val_across_arith_op): Remove the last parameter, always pass
NULL_TREE to ipa_get_jf_arith_result in its last argument.
(propagate_vals_across_arith_jfunc): Do not pass res_type to
get_val_across_arith_op.
(propagate_vals_across_pass_through): Add checking assert that
parm_type is not NULL.

gcc/testsuite/ChangeLog:

2025-01-24  Martin Jambor  <mjambor@suse.cz>

PR ipa/118097
* gcc.dg/ipa/pr118097.c: New test.
* gcc.dg/ipa/pr118535.c: Likewise.
* gcc.dg/ipa/ipa-notypes-1.c: Likewise.

arm: fix typo in dg-require-effective-target [PR118089]

Trivial typo.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c (dg-require-effective-target): Fix
typo in directive.

i386: Change RTL representation of bt[lq] [PR118623]

The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
  [(set (pc)
        (if_then_else (match_operator 0 "bt_comparison_operator"
                        [(zero_extract:SWI48
                           (match_operand:SWI48 1 "nonimmediate_operand")
                           (const_int 1)
                           (match_operand:QI 2 "nonmemory_operand"))
                         (const_int 0)])
                      (label_ref (match_operand 3))
                      (pc)))
   (clobber (reg:CC FLAGS_REG))]
  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
   && (CONST_INT_P (operands[2])
       ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
          && INTVAL (operands[2])
               >= (optimize_function_for_size_p (cfun) ? 8 : 32))
       : !memory_operand (operands[1], <MODE>mode))
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extract:SWI48
            (match_dup 1)
            (const_int 1)
            (match_dup 2))
          (const_int 0)))
   (set (pc)
        (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
                      (label_ref (match_dup 3))
                      (pc)))]
{
  operands[0] = shallow_copy_rtx (operands[0]);
  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set.  These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it.  If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.

So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything.  The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly.  If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U.  The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.

2025-02-10  Jakub Jelinek  <jakub@redhat.com>

PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt<SWI48:mode>_mask): Likewise.
(*jcc_bt<mode>): Likewise.  Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_mask): Likewise.
(*jcc_bt<SWI48:mode>_mask_1): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
(*bt<mode>_setncqi_2): Likewise.
(*bt<mode>_setc<mode>_mask): Likewise.

* gcc.c-torture/execute/pr118623.c: New test.

testsuite: Fix two testisms on x86 after PFA [PR118754]

These two tests now vectorize the result finding
loop with PFA and so the number of loops checked
fails.

This fixes them by adding #pragma GCC novector to
the testcases.

gcc/testsuite/ChangeLog:

PR testsuite/118754
* gcc.dg/vect/vect-tail-nomask-1.c: Add novector.
* gcc.target/i386/pr106010-8c.c: Likewise.

Daily bump.

[PR target/115123] Fix testsuite fallout from sinking heuristic change

Code sinking is just semantic preserving code motions, so it's a lot like
scheduling in that code motions can change the vector configuration needed at
various program points. That in turn can also change the number of vsetvls as
we may or may not be able to merge them after the code motions.

The sinking heuristics were twiddled several months ago resulting in a handful
of scan-asm failures. This patch adjusts the tests appropriately fixing
pr115123 (P3 regression).

PR target/115123
gcc/testsuite
* gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise.

[PR middle-end/117263] Avoid unused-but-set warning in genautomata

This is a trivial bug where a user wanted to define NDEBUG when building
genautomata, presumably trying to debug its behavior. This resulted in a
unused-but-set warning which caused the build to fail.

Dario included the trivial fixes in the PR which I put through the usual
bootstrap & regression test as well as compiling genautomata with NDEBUG.

Pushing to the trunk.

PR middle-end/117263
gcc/
* genautomata.cc (output_statistics): Avoid set but unnused warnings
when compiling with NDEBUG.

Test procedure dummy arguments against global symbols, if available.

this fixes a rather old PR from 2005, where a subroutine
could be passed and called as a function. This patch checks
for that, also for the reverse, and for wrong types of functions.

I expect that this will find a few bugs in dusty deck code...

gcc/fortran/ChangeLog:

PR fortran/24878
* interface.cc (compare_parameter): Check global subroutines
passed as actual arguments for subroutine / function and
function type.

gcc/testsuite/ChangeLog:

PR fortran/24878
* gfortran.dg/interface_51.f90: New test.

[RISC-V][PR target/118146] Fix ICE for unsupported modes

There's some special case code in the risc-v move expander to try and optimize
cases where the source is a subreg of a vector and the destination is a scalar
mode.

The code works fine except when we have no support for the given mode. ie HF or
BF when those extensions aren't enabled. We'll end up tripping an assert in
that case when we should have just let standard expansion do its thing.

Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester
to render a verdict before moving forward.

PR target/118146
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg
of vector source better to avoid ICE.

gcc/testsuite
* gcc.target/riscv/pr118146-1.c: New test.
* gcc.target/riscv/pr118146-2.c: New test.

Daily bump.

ad target/118764: Fix a typo in doc/extend.texi.

gcc/
PR target/118764
* doc/invoke.texi (AVR Options): Fix typos.

[PATCH] OpenMP: Improve Fortran metadirective diagnostics [PR107067]

The Fortran front end was giving an ICE instead of a user-friendly
diagnostic when variants of a metadirective variant had different
statement associations. The particular test case reported in the issue
also involved invalid placement of the "omp end metadirective" which
was not being diagnosed either.

gcc/fortran/ChangeLog
PR middle-end/107067
* parse.cc (parse_omp_do): Diagnose missing "OMP END METADIRECTIVE"
after loop.
(parse_omp_structured_block): Likewise for strictly structured block.
(parse_omp_metadirective_body): Use better test for variants ending
at different places. Issue a user diagnostic at the end if any
were inconsistent, instead of calling gcc_assert.

gcc/testsuite/ChangeLog
PR middle-end/107067
* gfortran.dg/gomp/metadirective-11.f90: Remove the dg-ice, update
for current behavior, and add more tests to exercise the new error
code.

libgcc: On FreeBSD use GCC's crt objects for static linking

Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's
crt objects for static linking. Otherwise it could mix crtbeginT.o
from the base system with libgcc's crtend.o, possibly leading to
segfaults.

libgcc:
PR target/118685
* config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts.

Signed-off-by: Dimitry Andric <dimitry@andric.com>

GCN, nvptx: 'sorry, unimplemented: exception handling not supported'

For GCN, this avoids ICEs further down the compilation pipeline. For nvptx,
there's effectively no change: in presence of exception handling constructs,
instead of 'sorry, unimplemented: target cannot support nonlocal goto', we
now emit 'sorry, unimplemented: exception handling not supported'.

Additionally, turn test cases into UNSUPPORTED if running into
'sorry, unimplemented: exception handling not supported'.

gcc/
* config/gcn/gcn.md (exception_receiver): 'define_expand'.
* config/nvptx/nvptx.md (exception_receiver): Likewise.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.
* gcc.dg/pr104464.c: Remove GCN XFAIL.
libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.

For a few test cases, clarify dependance on effective-target 'nonlocal_goto' into 'exceptions'

For example, for nvptx, these test cases currently indeed fail with
'sorry, unimplemented: target cannot support nonlocal goto'. However,
that's just an artefact of non-existing support for exception handling,
and these test cases already require effective-target 'exceptions'.

gcc/testsuite/
* gcc.dg/cleanup-12.c: Don't 'dg-skip-if "" { ! nonlocal_goto }'.
* gcc.dg/cleanup-13.c: Likewise.
* gcc.dg/cleanup-5.c: Likewise.
* gcc.dg/gimplefe-44.c: Don't
'dg-require-effective-target nonlocal_goto'.

nvptx doesn't actually support effective-target 'exceptions'

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_exceptions):
'return 0' for '[istarget nvptx-*-*]'.

BPF doesn't actually support effective-target 'exceptions' [PR118772]

PR target/118772
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_exceptions):
'return 0' for '[istarget bpf-*-*]'.

Clarify that effective-targets 'exceptions' and 'exceptions_enabled' are orthogonal

In Subversion r268025 (Git commit 3f21b8e3f7be32dd2b3624a2ece12f84bed545bb)
"Add dg-require-effective-target exceptions", effective-target 'exceptions'
was added, which "says that AMD GCN does not support [exception handling]".

In Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca)
"MSP430: Add -fno-exceptions multilib", effective-target 'exceptions_enabled'
was added "to check if the testing configuration supports exceptions".  Testing
"if exceptions are unsupported or disabled (e.g. by passing -fno-exceptions)"
works as expected if exception handling is disabled at the front-end level
('-fno-exceptions'; the "exceptions are [...] disabled" case):

    exceptions_enabled2066068.cc: In function ‘void foo()’:
    exceptions_enabled2066068.cc:3:27: error: exception handling disabled, use ‘-fexceptions’ to enable

However, effective-target 'exceptions_enabled' additionally assumes that
"If exceptions aren't supported [by the target], then they're not enabled".
This is not correct: it's not unlikely that, in presence of explicit/implicit
'-fexceptions', exception handling code gets fully optimized away by the
compiler, and therefore effective-target 'exceptions_enabled' test cases may
PASS even for targets that don't support effective-target 'exceptions'; these
two effective-targets are orthogonal concepts.

(For completeness: code with trivial instances of C++ exception handling may
translate into simple '__cxa_allocate_exception', '__cxa_throw' function calls
without requiring any back end-level "exceptions magic", and then trigger
unresolved symbols at link time, if these functions are not available.)

This change only affects GCN, as that one currently is the only target declared
as not supporting effective-target 'exceptions'.

gcc/
* doc/sourcebuild.texi (Effective-Target Keywords): Clarify that
effective-target 'exceptions' and 'exceptions_enabled' are
orthogonal.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-prune): Clarify effective-target
'exceptions_enabled'.
* lib/target-supports.exp
(check_effective_target_exceptions_enabled): Don't consider
effective-target 'exceptions'.
libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Clarify
effective-target 'exceptions_enabled'.

'gcc.dg/pr88870.c': don't 'dg-require-effective-target nonlocal_goto'

I confirm that back then, 'gcc.dg/pr88870.c' for nvptx failed due to
'sorry, unimplemented: target cannot support nonlocal goto', however at some
(indeterminate) point in time, that must've disappeared, and we now don't have
to 'dg-require-effective-target nonlocal_goto' anymore, and therefore get:

[-UNSUPPORTED:-]{+PASS:+} gcc.dg/pr88870.c {+(test for excess errors)+}

(And, if ever necessary again, this nowadays probably should
'dg-require-effective-target exceptions' instead of 'nonlocal_goto'.)

gcc/testsuite/
* gcc.dg/pr88870.c: Don't 'dg-require-effective-target nonlocal_goto'.

i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_<code><mode> define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes.  It works fine with
VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
<code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the <mask_codefor><code><mode>3<mask_name>
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative.  So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.

The following patch fixes it by keeping the <code><mode>3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
<code><mode>3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.

2025-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/118776
* config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*avx512bw_<code><mode>3<mask_name>): ... this.  Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.

* gcc.target/i386/pr118776.c: New test.

x86: Verify that PUSH/POP can be skipped

For

int f(int);

int advance(int dz)
{
    if (dz > 0)
        return (dz + dz) * dz;
    else
        return dz * f(dz);
}

Before r15-1619-g3b9b8d6cfdf593

advance(int):
        push    rbx
        mov     ebx, edi
        test    edi, edi
        jle     .L2
        imul    ebx, edi
        lea     eax, [rbx+rbx]
        pop     rbx
        ret
.L2:
        call    f(int)
        imul    eax, ebx
        pop     rbx
        ret

After

advance(int):
        test    edi, edi
        jle     .L2
        imul    edi, edi
        lea     eax, [rdi+rdi]
        ret
.L2:
        sub     rsp, 24
        mov     DWORD PTR [rsp+12], edi
        call    f(int)
        imul    eax, DWORD PTR [rsp+12]
        add     rsp, 24
        ret

There's no call in if branch, it's not optimal to push rbx at the entry
of the function, it can be sinked to else branch. When "jle .L2" is not
taken, it can save one push instruction.  Update pr111673.c to verify
that this optimization isn't turned off.

PR rtl-optimization/111673
* gcc.target/i386/pr111673.c: Verify that PUSH/POP can be
skipped.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

aarch64: gimple fold aes[ed] [PR114522]

Instead of waiting to get combine/rtl optimizations fixed here. This fixes the
builtins at the gimple level. It should provide for slightly faster compile time
since we have a simplification earlier on.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

PR target/114522
* config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function.
(aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese
and crypto_aesd.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fortran: fix initialization of allocatable non-deferred character [PR59252]

PR fortran/59252

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_trans_subcomponent_assign): Initialize
allocatable non-deferred character with NULL properly.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocatable_char_1.f90: New test.

rs6000: Add cast to avoid pointer to integer comparison warning [PR117674]

2025-02-07 Peter Bergner <bergner@linux.ibm.com>

libgcc/
PR target/117674
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Add cast to
avoid comparison between pointer and integer warning.

Add a cache of recent lines

For larger files the file_cache line index will be spread out to make
the index fit into the fixed buffer, so any access to the non latest line
will need some skipping of lines.

Most accesses for line are near the latest line because
a diagnostic is likely near where the scanner is currently lexing.

Add a second cache for recent lines. It is organized as a ring buffer
and maintains the last 256 lines relative to the last input line.

With that, enabling -Wmisleading-indentation for the test case in
PR preprocessor/118168, is within the run-to-run variation.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache::m_line_recent,
m_line_recent_first, m_line_recent_last): Add.
(file_cache_slot::evict): Clear new fields.
(file_cache_slot::create): Clear new fields.
(file_cache_slot::file_cache_slot): Initialize new fields.
(file_cache_slot::~file_cache_slot): Release m_line_recent.
(file_cache_slot::get_next_line): Maintain ring buffer of lines
in m_line_recent.
(file_cache_slot::read_line_num): Use m_line_recent to look up
recent lines quickly.

arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]

For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry.  This saves
a couple of bytes in the resulting image.  This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.

Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.

Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.

Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.

Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note.  The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.

gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.

arm: fix ICE due to fix for POP {PC} change

My earlier change for making the compiler prefer

POP {PC}

over

LDR PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC. This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.