git.ipfire.org Git - thirdparty/gcc.git/log

[PR118067][LRA]: Use the right mode to evaluate secondary memory reload

  In the PR case, LRA made insn alternative costly.  It happened
because LRA incorrectly found that the alternative needs 2nd memory
reload as the wrong mode for targetm.secondary_memory_needed was used.
This resulted in LRA cycling as an alternative with mask regs was
chosen.  The patch fixes the PR and add more debug printing which
could be useful in the future for debugging function
process_alt_operands.

gcc/ChangeLog:

PR rtl-optimization/1180167
* lra-constraints.cc (process_alt_operands): Use operand mode not
subreg reg mode.  Add and improve debugging prints for updating
losers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/118067
* gcc.target/i386/pr118067.c: New.

Allow CFI_cdesc_t in argument lists with -fc-prototypes.

This patch fixes and reorganizes dumping C prototypes. It makes the following
changes:

- BIND(C) types are now always output before any global symbols
- CFI_cdesc_t is issued for assumed shape and assumed rank arguments.
- BIND(C,NAME="...") entities were not always issued.

gcc/fortran/ChangeLog:

PR fortran/118359
* dump-parse-tree.cc (show_external_symbol): New function.
(write_type): Add prototype, put in restrictions on what not to dump.
(has_cfi_cdesc): New function.
(need_iso_fortran_binding): New function.
(gfc_dump_c_prototypes): Adjust to take only a file output. Add
"#include <ISO_Fortran_binding.h" if CFI_cdesc_t is found.
Traverse global namespaces to dump types and the globalsymol list
to dump external symbols.
(gfc_dump_external_c_prototypes): Traverse global namespaces.
(get_c_type_name): Handle CFI_cdesc_t.
(write_proc): Also pass array spec to get_c_type_name.
* gfortran.h (gfc_dump_c_prototypes): Adjust prototype.
* parse.cc (gfc_parse_file): Adjust call to gfc_dump_c_prototypes.

OpenMP: Improve error message for invalid directive in "assumes".

gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_assumption_clauses): Give a more specific
error message for invalid directives vs unknown names.

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_assumption_clauses): Give a more specific
error message for invalid directives vs unknown names.

gcc/fortran/ChangeLog
* openmp.cc (gfc_omp_absent_contains_clause): Use an Oxford comma
in the message.

gcc/testsuite/ChangeLog
* c-c++-common/gomp/assume-2.c: Adjust expected diagnostics.
* c-c++-common/gomp/assumes-2.c: Likewise.
* c-c++-common/gomp/begin-assumes-2.c: Likewise.
* gfortran.dg/gomp/allocate-6.f90: Likewise.
* gfortran.dg/gomp/assumes-2.f90: Likewise.

d: Add testcase for fixed PR116373

This was fixed in upstream, and merged in r15-6559-g332cf038fda109.

PR d/116373

gcc/testsuite/ChangeLog:

* gdc.dg/pr116373.d: New test.

OpenMP: Update "declare target"/OpenMP context interaction

The code and test case previously implemented the OpenMP 5.0 spec,
which said in section 2.3.1:

"For functions within a declare target block, the target trait is added
to the beginning of the set..."

In OpenMP 5.1, this was changed to
"For device routines, the target trait is added to the beginning of
the set..."

In OpenMP 5.2 and TR12, it says:
"For procedures that are determined to be target function variants
by a declare target directive..."

The definition of "device routine" in OpenMP 5.1 is confusing, but
certainly the intent of the later versions of the spec is clear that
it doesn't just apply to functions within a begin declare target/end
declare target block.

The only use of the "omp declare target block" function attribute was
to support the 5.0 language, so it can be removed. This patch changes
the context augmentation to use the "omp declare target" attribute
instead.

gcc/c-family/ChangeLog
* c-attribs.cc (c_common_gnu_attributes): Delete "omp declare
target block".

gcc/c/ChangeLog
* c-decl.cc (c_decl_attributes): Don't add "omp declare target
block".

gcc/cp/ChangeLog
* decl2.cc (cplus_decl_attributes): Don't add "omp declare target
block".

gcc/ChangeLog
* omp-general.cc (omp_complete_construct_context): Check
"omp declare target" attribute, not "omp declare target block".

gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-target-indirect-2.c : Adjust
expected output for removal of "omp declare target block".
* c-c++-common/gomp/declare-variant-8.c: Likewise, the variant
call to f20 is now resolved differently.
* c-c++-common/gomp/reverse-offload-1.c: Adjust expected output.
* gfortran.dg/gomp/declare-variant-8.f90: Likewise, both f18
and f20 now resolve to the variant. Delete obsolete comments.

OpenMP: Shared metadirective/dynamic selector tests for C and C++

gcc/testsuite/ChangeLog
* c-c++-common/gomp/adjust-args-6.c: New.
* c-c++-common/gomp/attrs-metadirective-1.c: New.
* c-c++-common/gomp/attrs-metadirective-2.c: New.
* c-c++-common/gomp/attrs-metadirective-3.c: New.
* c-c++-common/gomp/attrs-metadirective-4.c: New.
* c-c++-common/gomp/attrs-metadirective-5.c: New.
* c-c++-common/gomp/attrs-metadirective-6.c: New.
* c-c++-common/gomp/attrs-metadirective-7.c: New.
* c-c++-common/gomp/attrs-metadirective-8.c: New.
* c-c++-common/gomp/declare-variant-arg-exprs.c: New.
* c-c++-common/gomp/declare-variant-dynamic-1.c: New.
* c-c++-common/gomp/declare-variant-dynamic-2.c: New.
* c-c++-common/gomp/metadirective-1.c: New.
* c-c++-common/gomp/metadirective-2.c: New.
* c-c++-common/gomp/metadirective-3.c: New.
* c-c++-common/gomp/metadirective-4.c: New.
* c-c++-common/gomp/metadirective-5.c: New.
* c-c++-common/gomp/metadirective-6.c: New.
* c-c++-common/gomp/metadirective-7.c: New.
* c-c++-common/gomp/metadirective-8.c: New.
* c-c++-common/gomp/metadirective-construct.c: New.
* c-c++-common/gomp/metadirective-device.c: New.
* c-c++-common/gomp/metadirective-no-score.c: New.
* c-c++-common/gomp/metadirective-target-device-1.c: New.
* c-c++-common/gomp/metadirective-target-device-2.c: New.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/metadirective-1.c: New.
* testsuite/libgomp.c-c++-common/metadirective-2.c: New.
* testsuite/libgomp.c-c++-common/metadirective-3.c: New.
* testsuite/libgomp.c-c++-common/metadirective-4.c: New.
* testsuite/libgomp.c-c++-common/metadirective-5.c: New.
* testsuite/libgomp.c-c++-common/metadirective-late-1.c: New.
* testsuite/libgomp.c-c++-common/metadirective-late-2.c: New.
* testsuite/libgomp.c-c++-common/metadirective-target-device.c: New.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>

OpenMP: C++ support for metadirectives and dynamic selectors.

Additional shared C/C++ testcases are included in a subsequent patch in this
series.

gcc/cp/ChangeLog
PR middle-end/112779
PR middle-end/113904
* cp-tree.h (struct saved_scope): Add new field
x_processing_omp_trait_property_expr.
(processing_omp_trait_property_expr): New.
* parser.cc (cp_parser_skip_to_end_of_block_or_statement): Add
metadirective_p parameter and handle skipping over the parentheses
in a "for" statement.
(struct omp_metadirective_parse_data): New.
(mangle_metadirective_region_label): New.
(cp_parser_label_for_labeled_statement): Mangle label names in a
metadirective body.
(cp_parser_jump_statement): Likewise.
(cp_parser_omp_context_selector): Allow arbitrary expressions in
device_num and condition properties.
(cp_parser_omp_assumption_clauses): Handle C_OMP_DIR_META.
(analyze_metadirective_body): New.
(cp_parser_omp_metadirective): New.
(cp_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
* parser.h (struct cp_parser): Add omp_metadirective_state field.
* pt.cc (tsubst_omp_context_selector): New.
(tsubst_stmt): Handle OMP_METADIRECTIVE.
* semantics.cc (finish_id_expression_1): Don't diagnose use of
parameter outside function body in dynamic selector expressions here.

gcc/testsuite/
PR middle-end/112779
PR middle-end/113904
* c-c++-common/gomp/declare-variant-2.c: Adjust output for C++.
* g++.dg/gomp/declare-variant-class-1.C: New.
* g++.dg/gomp/declare-variant-class-2.C: New.
* g++.dg/gomp/metadirective-template-1.C: New.

libgomp/
PR middle-end/112779
PR middle-end/113904
* testsuite/libgomp.c++/metadirective-template-1.C: New.
* testsuite/libgomp.c++/metadirective-template-2.C: New.
* testsuite/libgomp.c++/metadirective-template-3.C: New.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>

OpenMP: Add C support for metadirectives and dynamic selectors.

Additional shared C/C++ testcases are included in a subsequent patch in this
series.

gcc/c-family/ChangeLog
PR middle-end/112779
PR middle-end/113904
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_variant_construct): Declare.
* c-gimplify.cc: Include omp-general.h.
(genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Add case for OMP_METADIRECTIVE.
* c-omp.cc (c_omp_directives): Fix entries for metadirective.
(c_omp_expand_variant_construct_r): New.
(c_omp_expand_variant_construct): New.
* c-pragma.cc (omp_pragmas): Add metadirective.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.

gcc/c/ChangeLog
PR middle-end/112779
PR middle-end/113904
* c-parser.cc (struct c_parser): Add omp_metadirective_state field.
(c_parser_skip_to_end_of_block_or_statement): Add metadirective_p
parameter and handle skipping over the parentheses in a "for"
statement.
(struct omp_metadirective_parse_data): New.
(mangle_metadirective_region_label): New.
(c_parser_label): Mangle label names in a metadirective body.
(c_parser_statement_after_labels): Likewise.
(c_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
(c_parser_omp_context_selector): Allow arbitrary expressions in
device_num and condition properties.
(c_parser_omp_assumption_clauses): Handle C_OMP_DIR_META.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.

gcc/testsuite/
PR middle-end/112779
* c-c++-common/gomp/declare-variant-2.c: Adjust expected output for C.
* gcc.dg/gomp/metadirective-1.c: New.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>

rs6000: Fix ICE for invalid constants in built-in functions

For invalid constant operand values used in built-in functions, return
const0_rtx to signify an error occurred during expansion.

2025-01-16 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return
const0_rtx when there is an error.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-error.c: New test.

rs6000: Fix loop limit for built-in constant checking

The loop checking for built-in constant operand restrictions was missing
some operands due to the loop limit being too small. Fixing that exposed
a testsuite failure which is caused by a typo in the pmxvi4ger8pp definition
where we had made the PMASK field too small.

2025-01-16 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Use correct
array size for the loop limit.
* config/rs6000/rs6000-builtins.def: Fix field size for PMASK operand.

c++: Fix up reshape_* RAW_DATA_CST handling [PR118214]

The embed-17.C testcase is miscompiled and pr118214.C testcase used to be
miscompiled on the trunk before I've temporarily reverted the
r15-6339 C++ large initializer speed-up commit in r15-6448.
The problem is that reshape_* is only sometimes allowed to modify the given
CONSTRUCTOR in place (when reuse is true, so
                first_initializer_p
                && (complain & tf_error)
                && !CP_AGGREGATE_TYPE_P (elt_type)
                && !TREE_SIDE_EFFECTS (first_initializer_p)
) and at other times is not allowed to change it.  But the RAW_DATA_CST
handling was modifying those in place always, by peeling off whatever
was needed for the processing of the current element or set of elements
and leaving the rest in the original CONSTRUCTOR_ELTS, either as
RAW_DATA_CST with adjusted RAW_DATA_POINTER/RAW_DATA_LENGTH, or turning
it into INTEGER_CST if it would be a RAW_DATA_LENGTH == 1 RAW_DATA_CST.

The following patch fixes that by adding raw_idx member into
struct reshape_iter where we for the RAW_DATA_CST current elements track
offset into the current RAW_DATA_CST (how many elements were processed
from it already) and modifying the original CONSTRUCTOR_ELTS only if reuse
is true and we used the whole RAW_DATA_CST (with zero raw_idx); which means
just modifying its type in place.

2025-01-16  Jakub Jelinek  <jakub@redhat.com>

PR c++/118214
* decl.cc (struct reshape_iter): Add raw_idx member.
(cp_maybe_split_raw_data): Add inc_cur parameter, set *inc_cur,
don't modify original CONSTRUCTOR, use d->raw_idx to track index
into a RAW_DATA_CST d->cur->value.
(consume_init): Adjust cp_maybe_split_raw_data caller, increment
d->cur when cur_inc is true.
(reshape_init_array_1): Don't modify original CONSTRUCTOR when
handling RAW_DATA_CST d->cur->value and !reuse, instead use
d->raw_idx to track index into RAW_DATA_CST.
(reshape_single_init): Initialize iter.raw_idx.
(reshape_init_class): Adjust for introduction of d->raw_idx,
adjust cp_maybe_split_raw_data caller, do d->cur++ if inc_cur
rather than when it returns non-NULL.
(reshape_init_r): Check for has_designator_problem for second
half of _Complex earlier, also check for
error_operand_p (d->cur->value).  Use consume_init instead of
cp_maybe_split_raw_data with later conditional d->cur++.
(reshape_init): Initialize d.raw_idx.

* g++.dg/cpp/embed-17.C: New test.
* g++.dg/cpp0x/pr118214.C: New test.

c++: Change c++2b and gnu++2b to c++23 and gnu++23 in C++ diagnostics

This is something we should have done when -std=c++23 was made the
primary option and -std=c++2b turned into undocumented alias.

2025-01-16 Jakub Jelinek <jakub@redhat.com>

gcc/cp/
* parser.cc (cp_parser_lambda_declarator_opt,
cp_parser_statement, cp_parser_selection_statement,
cp_parser_jump_statement): Use -std=c++23 and -std=gnu++23
in diagnostics rather than -std=c++2b and -std=gnu++2b.
* semantics.cc (finish_compound_literal): Likewise.
* typeck2.cc (build_functional_cast_1): Likewise.
* decl.cc (start_decl): Likewise.
* constexpr.cc (ensure_literal_type_for_constexpr_object,
potential_constant_expression_1): Likewise.
gcc/c-family/
* c-lex.cc (interpret_float): Use -std=c++23 and -std=gnu++23
in diagnostics rather than -std=c++2b and -std=gnu++2b.

middle-end: Add early break conditions to vect-switch-search-line-fast.c [PR118451]

When this test was added initially it didn't add the early break effective
target tests.

This means that the test was "passing" (as in, it was failing to vectorize)
because many targets don't support early break.

But the test should not have been run for these targets. When the vectorizer
learned PFA the test started passing for 32-bit targets. I had adjusted the
testcase but fail to notice the requirements were wrong.

Thus this adds the extra guards, and on targets that don't support early break
this test will move to UNSUPPORTED, which is what it should have been all
along...

gcc/testsuite/ChangeLog:

PR testsuite/118451
* gcc.dg/vect/vect-switch-search-line-fast.c: Add early_break guards.

Extend OpenACC 'serial' testing, compiler-side

In 2019 commit 62aee289e4791fd68aace01accf433fb26b3eeae
"Add OpenACC 2.6 `serial' construct support", we didn't quite excel in test
suite coverage. Add some more, similar to OpenACC 'parallel' construct
testing.

gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-1.c: Extend OpenACC 'serial'
testing.
* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/Wparentheses-1.c: Likewise.
* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Likewise.
* c-c++-common/goacc/combined-directives-2.c: Likewise.
* c-c++-common/goacc/combined-directives-3.c: Likewise.
* c-c++-common/goacc/combined-directives.c: Likewise.
* c-c++-common/goacc/combined-reduction.c: Likewise.
* c-c++-common/goacc/data-clause-duplicate-1.c: Likewise.
* c-c++-common/goacc/default-1.c: Likewise.
* c-c++-common/goacc/default-2.c: Likewise.
* c-c++-common/goacc/default-3.c: Likewise.
* c-c++-common/goacc/default-4.c: Likewise.
* c-c++-common/goacc/default-5.c: Likewise.
* c-c++-common/goacc/if-clause-2.c: Likewise.
* c-c++-common/goacc/kernels-1.c: Likewise.
* c-c++-common/goacc/loop-1.c: Likewise.
* c-c++-common/goacc/loop-clauses.c: Likewise.
* c-c++-common/goacc/nesting-1.c: Likewise.
* c-c++-common/goacc/nesting-data-1.c: Likewise.
* c-c++-common/goacc/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/parallel-1.c: Likewise.
* c-c++-common/goacc/private-reduction-1.c: Likewise.
* c-c++-common/goacc/reduction-promotions.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
* c-c++-common/goacc/sb-1.c: Likewise.
* c-c++-common/goacc/sb-2.c: Likewise.
* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
* c-c++-common/goacc/uninit-if-clause.c: Likewise.
* c-c++-common/goacc/update-if_present-2.c: Likewise.
* g++.dg/goacc/template.C: Likewise.
* gfortran.dg/goacc/array-reduction.f90: Likewise.
* gfortran.dg/goacc/assumed.f95: Likewise.
* gfortran.dg/goacc/branch.f95: Likewise.
* gfortran.dg/goacc/coarray.f95: Likewise.
* gfortran.dg/goacc/coarray_2.f90: Likewise.
* gfortran.dg/goacc/combined-directives-3.f90: Likewise.
* gfortran.dg/goacc/combined-directives.f90: Likewise.
* gfortran.dg/goacc/common-block-1.f90: Likewise.
* gfortran.dg/goacc/common-block-2.f90: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/cray-2.f95: Likewise.
* gfortran.dg/goacc/cray.f95: Likewise.
* gfortran.dg/goacc/critical.f95: Likewise.
* gfortran.dg/goacc/data-clauses.f95: Likewise.
* gfortran.dg/goacc/default-1.f95: Likewise.
* gfortran.dg/goacc/default-2.f: Likewise.
* gfortran.dg/goacc/default-3.f95: Likewise.
* gfortran.dg/goacc/default-4.f: Likewise.
* gfortran.dg/goacc/default-5.f: Likewise.
* gfortran.dg/goacc/default_none.f95: Likewise.
* gfortran.dg/goacc/derived-types.f90: Likewise.
* gfortran.dg/goacc/firstprivate-1.f95: Likewise.
* gfortran.dg/goacc/gang-static.f95: Likewise.
* gfortran.dg/goacc/if.f95: Likewise.
* gfortran.dg/goacc/list.f95: Likewise.
* gfortran.dg/goacc/literal.f95: Likewise.
* gfortran.dg/goacc/loop-1-2.f95: Likewise.
* gfortran.dg/goacc/loop-1.f95: Likewise.
* gfortran.dg/goacc/loop-2-parallel-3.f95: Likewise.
* gfortran.dg/goacc/loop-3-2.f95: Likewise.
* gfortran.dg/goacc/loop-3.f95: Likewise.
* gfortran.dg/goacc/multi-clause.f90: Likewise.
* gfortran.dg/goacc/nested-parallelism.f90: Likewise.
* gfortran.dg/goacc/parameter.f95: Likewise.
* gfortran.dg/goacc/pr71704.f90: Likewise.
* gfortran.dg/goacc/private-3.f95: Likewise.
* gfortran.dg/goacc/pure-elemental-procedures.f95: Likewise.
* gfortran.dg/goacc/reduction-2.f95: Likewise.
* gfortran.dg/goacc/reduction-3.f95: Likewise.
* gfortran.dg/goacc/reduction-promotions.f90: Likewise.
* gfortran.dg/goacc/reduction.f95: Likewise.
* gfortran.dg/goacc/routine-3.f90: Likewise.
* gfortran.dg/goacc/routine-module-1.f90: Likewise.
* gfortran.dg/goacc/routine-module-2.f90: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
* gfortran.dg/goacc/sie.f95: Likewise.
* gfortran.dg/goacc/subarrays.f95: Likewise.
* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
* gfortran.dg/goacc/update-if_present-2.f90: Likewise.
* c-c++-common/goacc/loop-3.c: Rename to...
* c-c++-common/goacc/loop-3-parallel.c: ... this.
* gfortran.dg/goacc/parallel-kernels-clauses.f95: Rename to...
* gfortran.dg/goacc/compute_construct-clauses.f95: ... this.
Extend OpenACC 'serial' testing.
* gfortran.dg/goacc/parallel-kernels-regions.f95: Rename to...
* gfortran.dg/goacc/nesting-fail-1.f95: ... this. Extend OpenACC
'serial' testing.
* gfortran.dg/goacc/routine-external-level-of-parallelism-1.f:
Rename to...
* gfortran.dg/goacc/routine-external-level-of-parallelism-1-parallel.f:
... this.
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f:
Rename to...
* gfortran.dg/goacc/routine-external-level-of-parallelism-2-parallel.f:
... this.
* c-c++-common/goacc/loop-2-serial.c: New.
* c-c++-common/goacc/loop-3-serial.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-serial.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-serial.c: Likewise.
* c-c++-common/goacc/serial-1.c: Likewise.
* gfortran.dg/goacc/loop-2-serial-3.f95: Likewise.
* gfortran.dg/goacc/loop-2-serial-nested.f95: Likewise.
* gfortran.dg/goacc/loop-2-serial-tile.f95: Likewise.
* gfortran.dg/goacc/loop-2-serial.f95: Likewise.
* gfortran.dg/goacc/nested-reductions-1-serial.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-serial.f90: Likewise.
* gfortran.dg/goacc/private-explicit-serial-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-serial-1.f95: Likewise.
* gfortran.dg/goacc/routine-external-level-of-parallelism-1-serial.f:
Likewise.
* gfortran.dg/goacc/routine-external-level-of-parallelism-2-serial.f:
Likewise.
* gfortran.dg/goacc/serial-tree.f95: Likewise.

[OpenACC/Fortran testsuite] Use relative line numbers for a few DejaGnu directives

For easier maintenance.

gcc/testsuite/
* gfortran.dg/goacc/assumed.f95: Use relative line numbers for a
few DejaGnu directives.
* gfortran.dg/goacc/list.f95: Likewise.
* gfortran.dg/goacc/loop-1-2.f95: Likewise.
* gfortran.dg/goacc/loop-1.f95: Likewise.
* gfortran.dg/goacc/reduction.f95: Likewise.

Fortran: Create fresh ts.u.cl for result in gfc_get_symbol_for_expr [PR118441]

For intrinsic routines, called in libraries, the prototype is created from
the call via gfc_get_symbol_for_expr. For the actual arguments, it calls
gfc_copy_formal_args_intr which already ensures that the ts.u.cl is freshly
allocated.

This commit now ensures the same for character-returning functions.

PR fortran/118441

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_get_symbol_for_expr): Use
gfc_new_charlen for character-returning functions.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/intrinsic_pack_7.f90: New test.

libstdc++: Move std::basic_ostream to new internal header [PR99995]

This adds <bits/ostream.h> so that other headers don't need to include
all of <ostream>, which pulls in all of <format> since C++23 (for the
std::print and std::println overloads in <ostream>). This new header
allows the constrained operator<< in <bits/unique_ptr.h> to be defined
without all of std::format being compiled.

We could also replace <ostream> with <bits/ostream.h> in all of
<istream>, <fstream>, <sstream>, and <spanstream>. That seems more
likely to cause problems for users who might be expecting <sstream> to
define std::endl, for example. Although the standard doesn't guarantee
that, it is more reasonable than expecting <memory> to define it! We can
look into making those changes for GCC 16.

libstdc++-v3/ChangeLog:

PR libstdc++/99995
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/unique_ptr.h: Include bits/ostream.h instead of
ostream.
* include/std/ostream: Include new header.
* include/bits/ostream.h: New file.

libstdc++: Implement LWG 2937 for std::filesystem::equivalent [PR118158]

Do not report an error for (is_other(s1) && is_other(s2)) as the
standard originally said, nor for (is_other(s1) || is_other(s2)) as
libstdc++ was doing. We can compare inode numbers for special files and
so give sensible answers.

libstdc++-v3/ChangeLog:

PR libstdc++/118158
* src/c++17/fs_ops.cc (fs::equivalent): Remove error reporting
for is_other(s1) && is_other(s2) case, as per LWG 2937.
* testsuite/27_io/filesystem/operations/pr118158.cc: New test.

libstdc++: Check feature test macro for associative container node extraction

Replace some `__cplusplus > 201402L` preprocessor checks with more
expressive checks for the appropriate feature test macro.

libstdc++-v3/ChangeLog:

* include/bits/stl_map.h: Check __glibcxx_node_extract instead
of __cplusplus.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_tree.h: Likewise.

RISC-V: Update Xsfvqmacc and Xsfvfnrclip's testcases

Update Sifive Xsfvqmacc and Xsfvfnrclip extension's testcases.

version log:
Update synchronize LMUL settings with return type.

gcc/ChangeLog:

* config/riscv/vector.md: New attr set.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: Add vsetivli checking.
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: Ditto.

RISC-V: Update Xsfvfnrclip implementation.

Update implementation of Xsfvfnrclip, using return type as iterator.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (expand_floattype): New func.
(main): New type.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_XFQF_OPS): New def.
(vint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vint8m2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_XFQF_OPS): Ditto.
(rvv_arg_type_info::get_xfqf_float_type): Ditto.
* config/riscv/riscv-vector-builtins.def (xfqf_vector): Ditto.
(xfqf_float): Ditto.
* config/riscv/riscv-vector-builtins.h
(struct rvv_arg_type_info): New function prototype.
* config/riscv/sifive-vector.md: Update iterator.
* config/riscv/vector-iterators.md: Ditto.

forwprop: Ensure that shuffle masks are VECTOR_CSTs

As reported in PR118487, it is possible that the mask parameter
of a __builtin_shuffle() is not a VECTOR_CST.
If this is the case and checking is enabled then an ICE is triggered.
Let's add a check to fix this issue.

PR tree-optimization/118487

gcc/ChangeLog:

* tree-ssa-forwprop.cc (recognise_vec_perm_simplify_seq):
Ensure that shuffle masks are VECTOR_CSTs.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118487.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

forwprop: Eliminate redundant calls to to_constant()

When extracting the amount of vector elements, we currently
first check if the value is a contant with is_constant(),
followed by obtaining the value with to_constant(),
which internally calls is_constant() again.
We can address this by using is_constant (T*), which also
provides the constant value.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (recognise_vec_perm_simplify_seq):
Eliminate redundant calls to to_constant().

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

tree-optimization/115494 - PRE PHI translation and ranges

When we PHI translate dependent expressions we keep SSA defs in
place of the translated expression in case the expression itself
did not change even though it's context did and thus the validity
of ranges associated with it. That eventually leads to simplification
errors given we violate the precondition that used SSA defs fed to
vn_valueize are valid to use (including their associated ranges).
The following makes sure to replace those with new representatives
always, not only when the dependent expression translation changed it.

The fix was originally discovered by Michael Morin.

PR tree-optimization/115494
* tree-ssa-pre.cc (phi_translate_1): Always generate a
representative for translated dependent expressions.

* gcc.dg/torture/pr115494.c: New testcase.

Co-Authored-By: Mikael Morin <mikael@gcc.gnu.org>

tree-ssa-propagate: Special case lhs of musttail calls in may_propagate_copy [PR118430]

This patch ensures that VRP or similar passes don't replace the uses of lhs of
[[gnu::musttail]] calls with some constant (e.g. if the call is known is known
to return a singleton value range) etc. to make it more likely that it is actually
tail callable.

2025-01-16 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/118430
* tree-ssa-propagate.cc (may_propagate_copy): Return false if dest
is lhs of an [[gnu::musttail]] call.
(substitute_and_fold_dom_walker::before_dom_children): Formatting fix.

* c-c++-common/musttail14.c: Expect lhs on the must tail call calls.

tailc: Virtually undo IPA-VRP return value optimization for tail calls [PR118430]

When we have return somefn (whatever); where somefn is normally tail
callable and IPA-VRP determines somefn returns a singleton range, VRP
just changes the IL to
  somefn (whatever);
  return 42;
(or whatever the value in that range is).  The introduction of IPA-VRP
return value tracking then effectively regresses the tail call optimization.
This is even more important if the call is [[gnu::musttail]].

So, the following patch queries IPA-VRP whether a function returns singleton
range and if so and the value returned is identical to that, marks the
call as [tail call] anyway.  If expansion decides it can't use the tail
call, we'll still expand the return 42; or similar statement, and if it
decides it can use the tail call, that part will be ignored and we'll emit
normal tail call.

The reason it works is that the expand pass relies on the tailc pass to
do its job properly.
E.g. when we have
  <bb 2> [local count: 1073741824]:
  foo (x_2(D));
  baz (&v);
  v ={v} {CLOBBER(eos)};
  bar (x_2(D)); [tail call]
  return 1;
when expand_gimple_basic_block handles the bar (x_2(D)); call, it uses
          if (call_stmt && gimple_call_tail_p (call_stmt))
            {
              bool can_fallthru;
              new_bb = expand_gimple_tailcall (bb, call_stmt, &can_fallthru);
              if (new_bb)
                {
                  if (can_fallthru)
                    bb = new_bb;
                  else
                    {
                      currently_expanding_gimple_stmt = NULL;
                      return new_bb;
                    }
                }
            }
As it is actually tail callable during expansion of the bar (x_2(D)); call
stmt, expand_gimple_tailbb returns non-NULL and sets can_fallthru to false,
plus emits
;; bar (x_2(D)); [tail call]

(insn 11 10 12 2 (set (reg:SI 5 di)
        (reg/v:SI 99 [ x ])) "pr118430.c":35:10 -1
     (nil))

(call_insn/j 12 11 13 2 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("bar") [flags 0x3]  <function_decl 0x7fb39020bd00 bar>) [0 bar S1 A8])
            (const_int 0 [0]))) "pr118430.c":35:10 -1
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x3]  <function_decl 0x7fb39020bd00 bar>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (expr_list:SI (use (reg:SI 5 di))
        (nil)))

(barrier 13 12 0)
Because it doesn't fallthru, no further statements in the same bb are
expanded.  Now, if the bb with return happened to be in some other basic
block from the [tail call], it could be expanded but because the bb with
tail call ends with a barrier, it doesn't fall thru there and if nothing
else could reach it, we'd remove the unreachable bb RSN.

2025-01-16  Jakub Jelinek  <jakub@redhat.com>
    Andrew Pinski  <quic_apinski@quicinc.com>

PR tree-optimization/118430
* tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
symbol-summary.h, ipa-cp.h and ipa-prop.h.
(find_tail_calls): If ass_var is NULL and ret_var is not, check if
IPA-VRP has not found singleton return range for it.  In that case,
don't punt if ret_var is the only value in that range.  Adjust the
maybe_error_musttail message otherwise to diagnose different value
being returned from the caller and callee rather than using return
slot.  Formatting fixes.

* c-c++-common/musttail14.c: New test.
* c-c++-common/pr118430.c: New test.

docs: Fix up inline asm documentation

When writing the gcc-15/changes.html patch posted earlier, I've been
wondering where significant part of the Basic asm chapter went and the
problem was the insertion of a new @node in the middle of the Basic Asm
@node, plus not mentioning the new @node in the @menu.  So the asm constexpr
node was not normally visible and the Remarks for the section neither.

The following patch moves it before Asm Labels, removes the spots where it
described what hasn't been actually committed (constant expression can only
be a container with data/size member functions) and fixes up the toplevel
extended asm documentation (it was in the Basic Asm remarks and Extended Asm
section's remark still said it is not valid).

2025-01-16  Jakub Jelinek  <jakub@redhat.com>

* doc/extend.texi (Using Assembly Language with C): Add Asm constexprs
to @menu.
(Basic Asm): Move @node asm constexprs before Asm Labels, rename to
Asm constexprs, change wording so that it is clearer that the constant
expression actually must not return a string literal, just some specific
container and other wording tweaks.  Only talk about top-level for basic
asms in this @node, move restrictions on top-level extended asms to ...
(Extended Asm): ... here.

vec.h: Properly destruct elements in auto_vec auto storage [PR118400]

For T with non-trivial destructors, we were destructing objects in the
vector on release only when not using auto storage of auto_vec.

The following patch calls truncate (0) instead of m_vecpfx.m_num clearing,
and truncate takes care of that destruction:
  unsigned l = length ();
  gcc_checking_assert (l >= size);
  if (!std::is_trivially_destructible <T>::value)
    vec_destruct (address () + size, l - size);
  m_vecpfx.m_num = size;

2025-01-16  Jakub Jelinek  <jakub@redhat.com>

PR ipa/118400
* vec.h (vec<T, va_heap, vl_ptr>::release): Call m_vec->truncate (0)
instead of clearing m_vec->m_vecpfx.m_num.

Fix typo to avoid ICE.

gcc/ChangeLog:

PR target/118489
* config/i386/sse.md (VF1_AVX512BW): Fix typo.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr118489.c: New test.

tree-optimization/115895 - overrun with masked loop

The following addresses the fact that with loop masking (or regular
mask loads) we do not implement load shortening but we override
the case where we need that for correctness. Likewise when we
attempt to use loop masking to handle large trailing gaps we cannot
do so when there's this overrun case.

PR tree-optimization/115895
* tree-vect-stmts.cc (get_group_load_store_type): When we
might overrun because the group size is not a multiple of the
vector size we cannot use loop masking since that does not
implement the required load shortening.

* gcc.target/i386/vect-pr115895.c: New testcase.

lm32: In va_arg, skip to stack args with too few remaining reg args

lm32 has 8 register parameter slots, so many vararg functions end up
with several anonymous parameters passed in registers. If we run out
of registers in the middle of a parameter, the entire parameter will
be placed on the stack, skipping any remaining available registers.

The receiving varargs function doesn't know this, and will save all of
the possible parameter register values just below the stack parameters.

When processing a va_arg call with a type size larger than a single
register, we must check to see if it spans the boundary between
register and stack parameters. If so, we need to skip to the stack
parameters.

This is done by making va_list a structure containing the arg pointer
and the address of the start of the stack parameters. Boundary checks
are inserted in va_arg calls to detect this case and the address of
the parameter is set to the stack parameter start when the parameter
crosses over.

gcc/
* config/lm32/lm32.cc: Add several #includes.
(va_list_type): New.
(lm32_build_va_list): New function.
(lm32_builtin_va_start): Likewise.
(lm32_sd_gimplify_va_arg_expr): Likewise.
(lm32_gimplify_va_arg_expr): Likewise.

lm32: Compute pretend_size in setup_incoming_varargs even if no_rtl

gcc/
* config/lm32/lm32.cc (setup_incoming_varargs): Adjust the
conditionals so that pretend_size is always computed, even
if no_rtl is set.

lm32: Skip last named param when computing save varargs regs

The cumulative args value in setup_incoming_varargs points at
the last named parameter. We need to skip over that (if present) to
get to the first anonymous argument as we only want to include
those anonymous args in the saved register block.

gcc/
* config/lm32/lm32.cc (lm32_setup_incoming_varargs): Skip last
named parameter when preparing to flush registers with unnamed
arguments to th stack.

lm32: Args with arg.named false still get passed in regs

* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed
arguments in registers too, just like named arguments.

Fix an incorrect file header comment for the core2 scheduling model

Committed as obvious.

gcc/ChangeLog:

* config/i386/x86-tune-sched-core.cc: Fix incorrect comment.

Fix setting of call graph node AutoFDO count

We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.

Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:

profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);

Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in

if (scale)
bb->count = bb->count.apply_scale (num, den);

This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.

The fix is to update the call graph node count once we've done count propagation.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR gcov-profile/116743
* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
and the entry block count.

Daily bump.

libstdc++: Fix use of internal feature test macro in test

This test should use __cpp_lib_ios_noreplace rather than the internal
__glibcxx_ios_noreplace macro.

libstdc++-v3/ChangeLog:

* testsuite/27_io/ios_base/types/openmode/case_label.cc: Use
standard feature test macro not internal one.

libstdc++: Fix fancy pointer test for std::set

The alloc_ptr.cc test for std::set tries to use C++17 features
unconditionally, and tries to use the C++23 range members which haven't
been implemented for std::set yet.

Some of the range checks are left in place but commented out, so they
can be added after the ranges members are implemented. Others (such as
prepend_range) are not valid for std::set at all.

Also fix uses of internal feature test macros in two other tests, which
should use the standard __cpp_lib_xxx macros.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc:
Guard node extraction checks with feature test macro. Remove
calls to non-existent range members.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
Use standard macro not internal one.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
Likewise.

match: Simplify `1 >> x` into `x == 0` [PR102705]

This in this PR we have missed optimization where we miss that,
`1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of
optimizing this, the easiest and simpliest is to simplify `1 >> x` into
just `x == 0` as those are equivalant (if we ignore out of range values for x).
we already have an optimization for `(1 >> X) !=/== 0` so the only difference
here is we don't need the `!=/== 0` part to do the transformation.

So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied
`1 >> x` -> `x == 0` one.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/102705

gcc/ChangeLog:

* match.pd (`(1 >> X) != 0`): Remove pattern.
(`1 >> x`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr105832-2.c: Update testcase.
* gcc.dg/tree-ssa/pr96669-1.c: Likewise.
* gcc.dg/tree-ssa/pr102705-1.c: New test.
* gcc.dg/tree-ssa/pr102705-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

doc: cleanup trailing whitespace

gcc/ChangeLog:

* doc/extend.texi: Cleanup trailing whitespace.

doc: trivial grammar fix

We say 'a constant .. expression' elsewhere. Fix the grammar.

gcc/ChangeLog:

* doc/extend.texi: Add 'a' for grammar fix.

libstdc++: Fix reversed args in unreachable assumption [PR109849]

libstdc++-v3/ChangeLog:

PR libstdc++/109849
* include/bits/vector.tcc (vector::_M_range_insert): Fix
reversed args in length calculation.

Fortran: reject NULL as source-expr in ALLOCATE with SOURCE= or MOLD= [PR71884]

PR fortran/71884

gcc/fortran/ChangeLog:

* resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as
source-expr.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr71884.f90: New test.

c++: Handle RAW_DATA_CST in unify [PR118390]

This patch uses the count_ctor_elements function to fix up
unify deduction of array sizes.

2025-01-15 Jakub Jelinek <jakub@redhat.com>

PR c++/118390
* cp-tree.h (count_ctor_elements): Declare.
* call.cc (count_ctor_elements): No longer static.
* pt.cc (unify): Use count_ctor_elements instead of
CONSTRUCTOR_NELTS.

* g++.dg/cpp/embed-20.C: New test.
* g++.dg/cpp0x/pr118390.C: New test.

AArch64: Update neoverse512tvb tuning

Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc:
* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.

AArch64: Add FULLY_PIPELINED_FMA to tune baseline

Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).

gcc:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
* config/aarch64/tuning_models/ampere1b.h: Remove redundant
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
* config/aarch64/tuning_models/neoversev2.h: Likewise.

AArch64: Deprecate -mabi=ilp32

ILP32 was originally intended to make porting to AArch64 easier. Support was
never merged in the Linux kernel or GLIBC, so it has been unsupported for many
years. There isn't a benefit in keeping unsupported features forever, so
deprecate it now (and it could be removed in a future release).

gcc:
* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
* doc/invoke.texi: Document -mabi=ilp32 as deprecated.

gcc/testsuite:
* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
* gcc.target/aarch64/pr100518.c: Likewise.
* gcc.target/aarch64/pr113114.c: Likewise.
* gcc.target/aarch64/pr80295.c: Likewise.
* gcc.target/aarch64/pr94201.c: Likewise.
* gcc.target/aarch64/pr94577.c: Likewise.
* gcc.target/aarch64/sve/pr108603.c: Likewise.

bpf: set index entry for a VAR_DECL in CO-RE relocs

CO-RE accesses with non pointer struct variables will also generate a
"0" string access within the CO-RE relocation.
The first index within the access string, has sort of a different
meaning then the remaining of the indexes.
For i0:i1:...:in being an access index for "struct A a" declaration, its
semantics are represented by:
(&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)

gcc/ChangeLog:
* config/bpf/core-builtins.cc (compute_field_expr): Change
VAR_DECL outcome in switch case.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-builtin-1.c: Correct test.
* gcc.target/bpf/core-builtin-2.c: Correct test.
* gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.

bpf: calls do not promote attr access_index on lhs

When traversing gimple to introduce CO-RE relocation entries to
expressions that are accesses to attributed perserve_access_index types,
the access is likely to be split in multiple gimple statments.
In order to keep doing the proper CO-RE convertion we will need to mark
the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
such that the gimple traverser will further convert the sub-expressions.

This patch makes sure that this LHS marking will not happen in case the
gimple statement is a function call, which case it is no longer
expecting to keep generating CO-RE accesses with the remaining of the
expression.

gcc/ChangeLog:

* config/bpf/core-builtins.cc
(make_gimple_core_safe_access_index): Fix in condition.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-attr-calls.c: New test.

bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT

Based on observation within bpf-next selftests and comparisson of GCC
and clang compiled code, the BPF loader expects all CO-RE relocations to
point to BTF non const and non volatile type nodes.

gcc/ChangeLog:

* btfout.cc (get_btf_kind): Remove static from function definition.
* config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type
is not a const or volatile.
* ctfc.h (btf_dtd_kind): Add prototype for function.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-attr-const.c: New test.

c++: Implement mangling of RAW_DATA_CST [PR118278]

As the following testcases show (mangle80.C only after reversion of the
temporary reversion of C++ large array speedup commit), RAW_DATA_CST can
be seen during mangling of some templates and we ICE because
the mangler doesn't handle it.

The following patch handles it and mangles it the same as a sequence of
INTEGER_CSTs that were used previously instead.
The only slight complication is that if ce->value is the last nonzero
element, we need to skip the zeros at the end of RAW_DATA_CST.

2025-01-03 Jakub Jelinek <jakub@redhat.com>

PR c++/118278
* mangle.cc (write_expression): Handle RAW_DATA_CST.

* g++.dg/abi/mangle80.C: New test.
* g++.dg/cpp/embed-19.C: New test.

c++: handle decltype in nested-name-spec printing [PR118139]

Compiling this test, we emit:

error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function

where the DECLTYPE_TYPE isn't printed properly. This patch fixes that
to print:

error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function

PR c++/118139

gcc/cp/ChangeLog:

* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
a computed-type-specifier.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/decltype1.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Fix comments in test that reference wrong subclause of C++11

libstdc++-v3/ChangeLog:

* testsuite/28_regex/traits/char/transform_primary.cc: Fix
subclause numbering in references to the standard.

middle-end: Fix incorrect type replacement in operands_equals [PR118472]

In g:3c32575e5b6370270d38a80a7fa8eaa144e083d0 I made a mistake and incorrectly
replaced the type of the arguments of an expression with the type of the
expression. This is of course wrong.

This reverts that change and I have also double checked the other replacements
and they are fine.

gcc/ChangeLog:

PR middle-end/118472
* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
replacement.

gcc/testsuite/ChangeLog:

PR middle-end/118472
* gcc.dg/pr118472.c: New test.

Annotate dbg_line_numbers table

The following adds /* <num> */ to dbg_line_numbers so there's the chance
to more easily lookup the ID of the match.pd line number used for
dumping when you want to debug a speicific replacement.  It also cuts
the lines down to 10 entries.

  static int dbg_line_numbers[1267] = {
        /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195,
        /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063,
...

* genmatch.cc (define_dump_logs): Make reverse lookup in
dbg_line_numbers easier by adding comments with start index
and cutting number of elements per line to 10.

testsuite: i386: Fix expected vectoriziation in pr105493.c

As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
The test code contains two loops, where the first one is exected to be
vectorized. The commit that broke that vectorization was the first of
several that enabled vectorization of both loops.
Now, that GCC can vectorize the whole function, let's adjust this test
to expect vectorization of both loops by ensuring that we don't write
to the helper-array 'tmp'.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
PR target/117079

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105493.c: Fix expected vectorization

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

OpenMP/C++: Fix 'declare variant' for struct-returning functions [PR118486]

To find the variant declaration, a call is constructed in
omp_declare_variant_finalize_one, which gives here:
TARGET_EXPR <D.3010, variant_fn ()>

Extracting now the function declaration failed and gave the bogus
error: could not find variant declaration

Solution: Use the 2nd argument of the TARGET_EXPR and continue.

PR c++/118486

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): When resolving
the variant to use, handle variant calls with TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-11.C: New test.

ipa: Initialize/release global obstack in process_new_functions [PR116068]

Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
before running a pass list and bitmap_obstack_release (NULL); after that,
while process_new_functions wasn't doing that and with the new r15-130
bitmap_alloc checking that results in ICE.

2025-01-15 Jakub Jelinek <jakub@redhat.com>

PR ipa/116068
* cgraphunit.cc (symbol_table::process_new_functions): Call
bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
around processing the functions.

* gcc.dg/graphite/pr116068.c: New test.

c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387]

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement. That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2

On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote:
> That seems pretty obviously what we want, and is what the other compilers
> implement.

This patch implements it then.

2025-01-15 Jakub Jelinek <jakub@redhat.com>

PR c++/118387
* method.cc (build_comparison_op): Set bad if
std::strong_ordering::equal doesn't convert to rettype.

* g++.dg/cpp2a/spaceship-err6.C: Expect another error.
* g++.dg/cpp2a/spaceship-synth17.C: Likewise.
* g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise.
* g++.dg/cpp2a/spaceship-synth-neg7.C: New test.

* testsuite/25_algorithms/default_template_value.cc
(Input::operator<=>): Use auto as return type rather than bool.

c++: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124]

The previous patch made me look around some more and I found
maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either,
while the RAW_DATA_CST is properly split during finish_compound_literal,
it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong,
RAW_DATA_CST could stand for far more initializers.

2025-01-15 Jakub Jelinek <jakub@redhat.com>

PR c++/118124
* cp-tree.h (build_array_of_n_type): Change second argument type
from int to unsigned HOST_WIDE_INT.
* tree.cc (build_array_of_n_type): Likewise.
* call.cc (count_ctor_elements): New function.
(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
(convert_like_internal): Use length from init's type instead of
len when handling the maybe_init_list_as_array case.

* g++.dg/cpp0x/initlist-opt5.C: New test.

c++: Fix ICEs with large initializer lists or ones including #embed [PR118124]

The following testcases ICE due to RAW_DATA_CST not being handled where it
should be during ck_list conversions.

The last 2 testcases started ICEing with r15-6339 committed yesterday
(speedup of large initializers), the first two already with r15-5958
(#embed optimization for C++).

For conversion to initializer_list<unsigned char> or char/signed char
we can optimize and keep RAW_DATA_CST with adjusted type if we report
narrowing errors if needed, for others this converts each element
separately.

2025-01-15 Jakub Jelinek <jakub@redhat.com>

PR c++/118124
* call.cc (convert_like_internal): Handle RAW_DATA_CST in
ck_list handling. Formatting fixes.

* g++.dg/cpp/embed-15.C: New test.
* g++.dg/cpp/embed-16.C: New test.
* g++.dg/cpp0x/initlist-opt3.C: New test.
* g++.dg/cpp0x/initlist-opt4.C: New test.

RISC-V: Fix code gen for reduction with length 0 [PR118182]

`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
return value will be the start value even if the length is 0.

However current code gen in RISC-V backend is not meet that semantic, it will
result a random garbage value if length is 0.

Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
        # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
        vsetvli zero,a5,e64,m1,ta,ma
        vfmv.s.f        v2,fa5     # insn 1
        vfredosum.vs    v1,v1,v2   # insn 2
        vfmv.f.s        fa5,v1     # insn 3

insn 1:
- vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
insn 2:
- vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
(v-spec say: `If vl=0, no operation is performed and the destination register
is not updated.`)
insn 3:
- vfmv.f.s will move the value from v1 even VL=0, so this is safe.

So how we fix that? we need two fix for that:

1. insn 1: need always execute with VL=1, so that we can guarantee it will
           always work as expect.
2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
           all reduction patterns, then we can guarantee vd[0] will contain the
           start value when vl=0

For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
(start value).

Change since V3:
- Rename _AV to _VL0_SAFE for readability.
- Use non-VL0_SAFE version if VL is const or VLMAX.
- Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
- Two more testcase.

gcc/ChangeLog:

PR target/118182
* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
argument for expand_reduction.
(*widen_reduc_plus_scal_<mode>): Ditto.
(*fold_left_widen_plus_<mode>): Ditto.
(*mask_len_fold_left_widen_plus_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
expand_reduction.
(reduc_smax_scal_<mode>): Ditto.
(reduc_umax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_umin_scal_<mode>): Ditto.
(reduc_and_scal_<mode>): Ditto.
(reduc_ior_scal_<mode>): Ditto.
(reduc_xor_scal_<mode>): Ditto.
(reduc_plus_scal_<mode>): Ditto.
(reduc_smax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_fmax_scal_<mode>): Ditto.
(reduc_fmin_scal_<mode>): Ditto.
(fold_left_plus_<mode>): Ditto.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_reduction): Add one more
argument for reduction code for vl0-safe.
* config/riscv/riscv-protos.h (expand_reduction): Ditto.
* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
reduction.
(ANY_REDUC_VL0_SAFE): New.
(ANY_WREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
(reduc_op): Add _VL0_SAFE variant of reduction.
(order) Ditto.
* config/riscv/vector.md (@pred_<reduc_op><mode>): New.

gcc/testsuite/ChangeLog:

PR target/118182
* gfortran.target/riscv/rvv/pr118182.f: New.
* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.

Fix SLP scalar costing with stmts also used in externals

When we have the situation of an external SLP node that is
permuted the scalar stmts recorded in the permute node do not
mean the scalar computation can be removed. We are removing
those stmts from the vectorized_scalar_stmts for this reason
but we fail to check this set when we cost scalar stmts. Note
vectorized_scalar_stmts isn't a complete set so also pass
scalar_stmts_in_externs and check that.

The following fixes this.

This shows in PR115777 when we avoid vectorizing the load, but
on it's own doesn't help the PR yet.

PR tree-optimization/115777
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
cost a scalar stmt that needs to be preserved.

lto: Remove link() to fix build with MinGW [PR118238]

I used link() to create cheap copies of Incremental LTO cache contents
to prevent their deletion once linking is finished.
This is unnecessary, since output_files are deleted in our lto-plugin
and not in the linker itself.

Bootstrapped/regtested on x86_64-linux.
lto-wrapper now again builds on MinGW. Though so far I have not setup
MinGW to be able to do full bootstrap.
Ok for trunk?

PR lto/118238

gcc/ChangeLog:

* lto-wrapper.cc (run_gcc): Remove link() copying.

lto-plugin/ChangeLog:

* lto-plugin.c (cleanup_handler):
Keep output_files when using Incremental LTO.
(onload): Detect Incremental LTO.

[RISC-V][PR target/118170] Add HF div/sqrt reservation

Clearly an oversight in the generic-ooo model caught by the checking code.  I
should have realized it was generic-ooo as we don't have a pipeline description
for the tenstorrent design yet, just the costing model.

The patch was extracted from the BZ which indicated Anton was the author, so I
kept that.  I'm listed as co-author just in case someone wants to complain
about the testcase in the future.  I didn't do any notable lifting here.

Thanks Peter and Anton!

PR target/118170
gcc/
* config/riscv/generic-ooo.md (generic_ooo_float_div_half): New
reservation.

gcc/testsuite
* gcc.target/riscv/pr118170.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

[PR rtl-optimization/109592] Simplify nested shifts

> The BZ in question is a failure to recognize a pair of shifts as a sign
> extension.
>
> I originally thought simplify-rtx would be the right framework to
> address this problem, but fwprop is actually better.  We can write the
> recognizer much simpler in that framework.
>
> fwprop already simplifies nested shifts/extensions to the desired RTL,
> but it's not considered profitable and we throw away the good work done
> by fwprop & simplifiers.
>
> It's hard to see a scenario where nested shifts or nested extensions
> that simplify down to a single sign/zero extension isn't a profitable
> transformation.  So when fwprop has nested shifts/extensions that
> simplifies to an extension, we consider it profitable.
>
> This allow us to simplify the testcase on rv64 with ZBB enabled from a
> pair of shifts to a single byte or half-word sign extension.

Hmm.  So just to summarise something that was discussed in the PR
comments, this is a case where combine's expand_compound_operation/
make_compound_operation wrangler hurts us, because the process isn't
idempotent, and combine produces two complex instructions:

(insn 6 3 7 2 (set (reg:DI 137 [ _3 ])
        (ashift:DI (reg:DI 139 [ x ])
            (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3}
     (expr_list:REG_DEAD (reg:DI 139 [ x ])
        (nil)))
(insn 12 7 13 2 (set (reg/i:DI 10 a0)
        (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0)
                (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend}
     (expr_list:REG_DEAD (reg:DI 137 [ _3 ])
        (nil)))

given two simple instructions:

(insn 6 3 7 2 (set (reg:SI 137 [ _3 ])
        (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip}
     (expr_list:REG_DEAD (reg/v:DI 136 [ x ])
        (nil)))
(insn 7 6 12 2 (set (reg:DI 138 [ _3 ])
        (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal}
     (expr_list:REG_DEAD (reg:SI 137 [ _3 ])
        (nil)))

If I run with -fdisable-rtl-combine then late_combine1 already does the
expected transformation.

Although it would be nice to fix combine, that might be difficult.
If we treat combine as immutable then the options are:

(1) Teach simplify-rtx to simplify combine's output into a single sign_extend.

(2) Allow fwprop1 to get in first, before combine has a chance to mess
    things up.

The patch goes for (2).

Is that a fair summary?

Playing devil's advocate, I suppose one advantage of (1) is that it
would allow the optimisation even if the original rtl looked like
combine's output.  And fwprop1 doesn't distinguish between cases in
which the source instruction disappears from cases in which the source
instruction is kept.  Thus we could transform:

  (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
  (set (reg:DI R3) (sign_extend:DI (reg:SI R2)))

into:

  (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
  (set (reg:DI R3) (sign_extend:DI (reg:QI R1)))

which increases the register pressure between the two instructions
(since R2 and R1 are both now live).  In general, there could be
quite a gap between the two instructions.

On the other hand, even in that case, fwprop1 would be parallelising
the extensions.  And since we're talking about unary operations,
even two-address targets would allow R1 to be extended without
tying the source and destination.

Also, it seems relatively unlikely that expand would produce code
that looks like combine's, since the gimple optimisers should have
simplified it into conversions.

So initially I was going to agree that it's worth trying in fwprop.  But...

[ commentary on Jeff's original approach dropped. ]

So it seems like it's a bit of a mess 🙁

If we do try to fix combine, I think something like the attached
would fit within the current scheme.  It is a pure shift-for-shift
transformation, avoiding any extensions.

Will think more about it, but wanted to get the above stream of
consciousness out before I finish for the day 🙂

PR rtl-optimization/109592
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify nested shifts with subregs.

gcc/testsuite
* gcc.target/riscv/pr109592.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: Adjust expected output

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

Daily bump.

c++: dump-lang-raw with obj_type_ref fields

Raw dump of lang tree was missing information about virtual method call.
The information is provided in "tok" field of obj_type_ref.

gcc/ChangeLog:

* tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/lang-dump-1.C: New test.

d: Merge upstream dmd, druntime d6f693b46a, phobos 336bed6d8.

D front-end changes:

- Import latest fixes from dmd v2.110.0-rc.1.

D runtime changes:

- Import latest fixes from druntime v2.110.0-rc.1.

Phobos changes:

- Import latest fixes from phobos v2.110.0-rc.1.

Included in the merge are fixes for the following PRs:

PR d/118438
PR d/118448
PR d/118449

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd d6f693b46a.
* d-incpath.cc (add_import_paths): Update for new front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime d6f693b46a.
* src/MERGE: Merge upstream phobos 336bed6d8.
* testsuite/libphobos.init_fini/custom_gc.d: Adjust test.

[ifcombine] robustify decode_field_reference

Arrange for decode_field_reference to use local variables throughout,
to modify the out parms only when we're about to return non-NULL, and
to drop the unused case of NULL pand_mask, that had a latent failure
to detect signbit masking.

for gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Rebustify to set
out parms only when returning non-NULL.
(fold_truth_andor_for_ifcombine): Bail if
decode_field_reference returns NULL. Add complementary assert
on r_const's not being set when l_const isn't.

c++: re-enable NSDMI CONSTRUCTOR folding [PR118355]

In c++/102990 we had a problem where massage_init_elt got {},
digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 },
and we crashed in the call to fold_non_dependent_init because
a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*.  So we avoided
calling fold_non_dependent_init for a CONSTRUCTOR.

But that broke the following test, where we no longer fold the
CONST_DECL in
  { .type = ZERO }
to
  { .type = 0 }
and then process_init_constructor_array does:

            if (next != error_mark_node
                && (initializer_constant_valid_p (next, TREE_TYPE (next))
                    != null_pointer_node))
              {
                /* Use VEC_INIT_EXPR for non-constant initialization of
                   trailing elements with no explicit initializers.  */
                picflags |= PICFLAG_VEC_INIT;

because { .type = ZERO } isn't initializer_constant_valid_p.  Then we
create a VEC_INIT_EXPR and say we can't convert the argument.

So we have to fold the elements of the CONSTRUCTOR.  We just can't
instantiate the elements in a template.

This also fixes c++/118047.

PR c++/118047
PR c++/118355

gcc/cp/ChangeLog:

* typeck2.cc (massage_init_elt): Call fold_non_dependent_init
unless for a CONSTRUCTOR in a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-list10.C: New test.
* g++.dg/cpp0x/nsdmi-list9.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

OpenMP: Remove dead code from declare variant reimplementation

After reimplementing late resolution of "declare variant", the
declare_variant_alt and calls_declare_variant_alt flags on struct
cgraph_node are no longer used by anything. For the purposes of
marking functions that need late resolution, the
has_omp_variant_constructs flag has replaced
calls_declare_variant_alt.

Likewise struct omp_declare_variant_entry, struct
omp_declare_variant_base_entry, and the hash tables used to store
these structures are no longer needed, since the information needed for
late resolution is now stored in the gomp_variant_construct nodes.

In addition, some obsolete code that was temporarily ifdef'ed out
instead of delted in order to produce a more readable patch for the
previous installment of this series is now removed entirely.

There are no functional changes in this patch, just removing dead code.

gcc/ChangeLog
* cgraph.cc (symbol_table::create_edge): Don't set
calls_declare_variant_alt in the caller.
* cgraph.h (struct cgraph_node): Remove declare_variant_alt
and calls_declare_variant_alt flags.
* cgraphclones.cc (cgraph_node::create_clone): Don't copy
calls_declare_variant_alt bit.
* gimplify.cc: Remove previously #ifdef-ed out code.
* ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code
referencing declare_variant_alt bit.
* ipa.cc (symbol_table::remove_unreachable_nodes): Likewise.
* lto-cgraph.cc (lto_output_node): Remove references to deleted
bits.
(output_refs): Adjust code referencing declare_variant_alt bit.
(input_overwrite_node): Remove references to deleted bits.
(input_refs): Adjust code referencing declare_variant_alt bit.
* lto-streamer-out.cc (lto_output): Likewise.
* lto-streamer.h (omp_lto_output_declare_variant_alt): Delete.
(omp_lto_input_declare_variant_alt): Delete.
* omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs
bit to trigger pass_omp_device_lower instead of
calls_declare_variant_alt.
* omp-general.cc (struct omp_declare_variant_entry): Delete.
(struct omp_declare_variant_base_entry): Delete.
(struct omp_declare_variant_hasher): Delete.
(omp_declare_variant_hasher::hash): Delete.
(omp_declare_variant_hasher::equal): Delete.
(omp_declare_variants): Delete.
(omp_declare_variant_alt_hasher): Delete.
(omp_declare_variant_alt_hasher::hash): Delete.
(omp_declare_variant_alt_hasher::equal): Delete.
(omp_declare_variant_alt): Delete.
(omp_lto_output_declare_variant_alt): Delete.
(omp_lto_input_declare_variant_alt): Delete.
(includes): Delete unnecessary include of gt-omp-general.h.
* omp-offload.cc (execute_omp_device_lower): Remove references
to deleted bit.
(pass_omp_device_lower::gate): Likewise.
* omp-simd-clone.cc (simd_clone_create): Likewise.
* passes.cc (ipa_write_summaries): Likeise.
* symtab.cc (symtab_node::get_partitioning_class): Likewise.
* tree-inline.cc (expand_call_inline): Likewise.
(tree_function_versioning): Likewise.

gcc/lto/ChangeLog
* lto-partition.cc (lto_balanced_map): Adjust code referencing
deleted declare_variant_alt bit.

OpenMP: Re-work and extend context selector resolution

This patch reimplements the middle-end support for "declare variant"
and extends the resolution mechanism to also handle metadirectives
(PR112779).  It also adds partial support for dynamic selectors
(PR113904) and fixes a selector scoring bug reported as PR114596.  I hope
this rewrite also improves the engineering aspect of the code, e.g. more
comments to explain what it is doing.

In most cases, variant constructs can be resolved either in the front
end or during gimplification; if the variant with the highest score
has a static selector, then only that one is emitted.  In the case
where it has a dynamic selector, it is resolved into a (possibly nested)
if/then/else construct, testing the run-time predicate for each selector
sorted by decreasing order of score until a static selector is found.

In some cases, notably a variant construct in a "declare simd"
function which may or may not expand into a simd clone, it may not be
possible to score or sort the variants until later in compilation (the
ompdevlow pass).  In this case the gimplifier emits a loop containing
a switch statement with the variants in arbitrary order and uses the
OMP_NEXT_VARIANT tree node as a placeholder to control which variant
is tested on each iteration of the loop.  It looks something like:

     switch_var = OMP_NEXT_VARIANT (0, state);
     loop_label:
     switch (switch_var)
       {
       case 1:
        if (dynamic_selector_predicate_1)
          {
            alternative_1;
            goto end_label;
          }
        else
          {
            switch_var = OMP_NEXT_VARIANT (1, state);
            goto loop_label;
          }
       case 2:
         ...
       }
      end_label:

Note that when there are no dynamic selectors, the loop is unnecessary
and only the switch is emitted.

Finally, in the ompdevlow pass, the OMP_NEXT_VARIANT magic cookies are
resolved and replaced with constants.  When compiling with -O we can
expect that the loop and switch will be discarded by subsequent
optimizations and replaced with direct jumps between the cases,
eventually arriving at code with similar control flow to the
early-resolution cases.

This approach is somewhat simpler than the one currently used for
handling declare variant in that all possible code paths are already
included in the output of the gimplifier, so it is not necessary to
maintain hidden references or data structures pointing to expansions of
not-yet-resolved variant constructs and special logic for passing them
through LTO (see PR lto/96680).

A possible disadvantage of this expansion strategy is that dead code
for unused variants in the switch can remain when compiling without
-O.  If this turns out to be a critical problem (e.g., an unused case
includes calls to functions not available to the linker) perhaps some
further processing could be performed by default after ompdevlow to
simplify such constructs.

In order to make this patch more readable for review purposes, it
leaves the existing code for "declare variant" resolution (including
the above-mentioned LTO hack) in place, in some cases just ifdef-ing
out functions that won't compile due to changed interfaces for
dependencies.  The next patch in the series will delete all the
now-unused code.

gcc/ChangeLog
PR middle-end/114596
PR middle-end/112779
PR middle-end/113904

* Makefile.in (GTFILES): Move omp-general.h earlier; required
because of moving score_wide_int declaration to that file.
* cgraph.h (struct cgraph_node): Add has_omp_variant_constructs flag.
* cgraphclones.cc (cgraph_node::create_clone): Propagate
has_omp_variant_constructs flag.
* gimplify.cc (omp_resolved_variant_calls): New.
(expand_late_variant_directive): New.
(find_supercontext): New.
(gimplify_variant_call_expr): New.
(gimplify_call_expr): Adjust parameters to make fallback available.
Update processing for "declare variant" substitution.
(is_gimple_stmt): Add OMP_METADIRECTIVE.
(omp_construct_selector_matches): Ifdef out unused function.
(omp_get_construct_context): New.
(gimplify_omp_dispatch): Replace call to deleted function
omp_resolve_declare_variant with equivalent logic.
(expand_omp_metadirective): New.
(expand_late_variant_directive): New.
(gimplify_omp_metadirective): New.
(gimplify_expr): Adjust arguments to gimplify_call_expr.  Add
cases for OMP_METADIRECTIVE, OMP_NEXT_VARIANT, and
OMP_TARGET_DEVICE_MATCHES.
(gimplify_function_tree): Initialize/clean up
omp_resolved_variant_calls.
* gimplify.h (omp_construct_selector_matches): Delete declaration.
(omp_get_construct_context): Declare.
* lto-cgraph.cc (lto_output_node): Write has_omp_variant_constructs.
(input_overwrite_node): Read has_omp_variant_constructs.
* omp-builtins.def (BUILT_IN_OMP_GET_NUM_DEVICES): New.
* omp-expand.cc (expand_omp_taskreg): Propagate
has_omp_variant_constructs.
(expand_omp_target): Likewise.
* omp-general.cc (omp_maybe_offloaded): Add construct_context
parameter; use it instead of querying gimplifier state.  Add
comments.
(omp_context_name_list_prop): Do not test lang_GNU_Fortran in
offload compiler, just use the string as-is.
(expr_uses_parm_decl): New.
(omp_check_context_selector): Add metadirective_p parameter.
Remove sorry for target_device selector.  Add additional checks
specific to metadirective or declare variant.
(make_omp_metadirective_variant): New.
(omp_construct_traits_match): New.
(omp_context_selector_matches): Temporarily ifdef out the previous
code, and add a new implementation based on the old one with
different parameters, some unnecessary loops removed, and code
re-indented.
(omp_target_device_matches_on_host): New.
(resolve_omp_target_device_matches): New.
(omp_construct_simd_compare): Support matching of "simdlen" and
"aligned" clauses.
(omp_context_selector_set_compare): Make static.  Adjust call to
omp_construct_simd_compare.
(score_wide_int): Move declaration to omp-general.h.
(omp_selector_is_dynamic): New.
(omp_device_num_check): New.
(omp_dynamic_cond): New.
(omp_context_compute_score): Ifdef out the old version and
re-implement with different parameters.
(omp_complete_construct_context): New.
(omp_resolve_late_declare_variant): Ifdef out.
(omp_declare_variant_remove_hook): Likewise.
(omp_resolve_declare_variant): Likewise.
(sort_variant): New.
(omp_get_dynamic_candidates): New.
(omp_declare_variant_candidates): New.
(omp_metadirective_candidates): New.
(omp_early_resolve_metadirective): New.
(omp_resolve_variant_construct): New.
* omp-general.h (score_wide_int): Moved here from omp-general.cc.
(struct omp_variant): New.
(make_omp_metadirective_variant): Declare.
(omp_construct_traits_to_codes): Delete declaration.
(omp_check_context_selector): Adjust parameters.
(omp_context_selector_matches): Likewise.
(omp_context_selector_set_compare): Delete declaration.
(omp_resolve_declare_variant): Likewise.
(omp_declare_variant_candidates): Declare.
(omp_metadirective_candidates): Declare.
(omp_get_dynamic_candidates): Declare.
(omp_early_resolve_metadirective): Declare.
(omp_resolve_variant_construct): Declare.
(omp_dynamic_cond): Declare.
* omp-offload.cc (resolve_omp_variant_cookies): New.
(execute_omp_device_lower): Call the above function to resolve
variant directives.  Remove call to omp_resolve_declare_variant.
(pass_omp_device_lower::gate): Check has_omp_variant_construct bit.
* omp-simd-clone.cc (simd_clone_create): Propagate
has_omp_variant_constructs bit.
* tree-inline.cc (expand_call_inline): Likewise.
(tree_function_versioning): Likewise.

gcc/c/ChangeLog
PR middle-end/114596
PR middle-end/112779
PR middle-end/113904
* c-parser.cc (c_finish_omp_declare_variant): Update for changes
to omp-general.h interfaces.

gcc/cp/ChangeLog
PR middle-end/114596
PR middle-end/112779
PR middle-end/113904
* decl.cc (omp_declare_variant_finalize_one): Update for changes
to omp-general.h interfaces.
* parser.cc (cp_finish_omp_declare_variant): Likewise.

gcc/fortran/ChangeLog
PR middle-end/114596
PR middle-end/112779
PR middle-end/113904
* trans-openmp.cc (gfc_trans_omp_declare_variant): Update for changes
to omp-general.h interfaces.

gcc/testsuite/
PR middle-end/114596
PR middle-end/112779
PR middle-end/113904
* c-c++-common/gomp/declare-variant-12.c: Adjust expected behavior
per PR114596.
* c-c++-common/gomp/declare-variant-13.c: Test that this is resolvable
after gimplification, not just final resolution.
* c-c++-common/gomp/declare-variant-14.c: Tweak testcase to ensure
that -O causes dead code to be optimized away.
* gfortran.dg/gomp/declare-variant-12.f90: Adjust expected behavior
per PR114596.
* gfortran.dg/gomp/declare-variant-13.f90: Test that this is resolvable
after gimplification, not just final resolution.
* gfortran.dg/gomp/declare-variant-14.f90: Tweak testcase to ensure
that -O causes dead code to be optimized away.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>
Co-Authored-By: Marcel Vollweiler <marcel@codesourcery.com>

OpenMP: New tree nodes for metadirective and dynamic selector support.

This patch adds basic support for three new tree node types that will
be used in subsequent patches to support OpenMP metadirectives and
dynamic selectors.

OMP_METADIRECTIVE is the internal representation of parsed OpenMP
metadirective constructs.  It's produced by the front ends and is expanded
during gimplification.

OMP_NEXT_VARIANT is used as a "magic cookie" for late resolution of
variant constructs that cannot be fully resolved during
gimplification, used to set the controlling variable of a switch
statement that branches to the next alternative once the candidate
list can be filtered and sorted.  These nodes are expanded into
constants in the ompdevlow pass.  In some gimple passes, they need to
be treated as constants.

OMP_TARGET_DEVICE_MATCHES is a similar "magic cookie" used to resolve
the target_device dynamic selector.  It is wrapped in an OpenMP target
construct, and can be resolved to a constant in the ompdevlow pass.

gcc/ChangeLog:
* doc/generic.texi (OpenMP): Document OMP_METADIRECTIVE,
OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES.
* fold-const.cc (operand_compare::hash_operand): Ignore
the new nodes.
* gimple-expr.cc (is_gimple_val): Allow OMP_NEXT_VARIANT
and OMP_TARGET_DEVICE_MATCHES.
* gimple.cc (get_gimple_rhs_num_ops): OMP_NEXT_VARIANT and
OMP_TARGET_DEVICE_MATCHES are both GIMPLE_SINGLE_RHS.
* tree-cfg.cc (tree_node_can_be_shared): Allow sharing of
OMP_NEXT_VARIANT.
* tree-inline.cc (remap_gimple_op_r): Ignore subtrees of
OMP_NEXT_VARIANT.
* tree-pretty-print.cc (dump_generic_node): Handle OMP_METADIRECTIVE,
OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES.
* tree-ssa-operands.cc (operands_scanner::get_expr_operands):
Ignore operands of OMP_NEXT_VARIANT and OMP_TARGET_DEVICE_MATCHES.
* tree.def (OMP_METADIRECTIVE): New.
(OMP_NEXT_VARIANT): New.
(OMP_TARGET_DEVICE_MATCHES): New.
* tree.h (OMP_METADIRECTIVE_VARIANTS): New.
(OMP_METADIRECTIVE_VARIANT_SELECTOR): New.
(OMP_METADIRECTIVE_VARIANT_DIRECTIVE): New.
(OMP_METADIRECTIVE_VARIANT_BODY): New.
(OMP_NEXT_VARIANT_INDEX): New.
(OMP_NEXT_VARIANT_STATE): New.
(OMP_TARGET_DEVICE_MATCHES_SELECTOR): New.
(OMP_TARGET_DEVICE_MATCHES_PROPERTIES): New.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>

[ifcombine] check and extend constants to compare with bitfields

Add logic to check and extend constants compared with bitfields, so
that fields are only compared with constants they could actually
equal.  This involves making sure the signedness doesn't change
between loads and conversions before shifts: we'd need to carry a lot
more data to deal with all the possibilities.

for  gcc/ChangeLog

PR tree-optimization/118456
* gimple-fold.cc (decode_field_reference): Punt if shifting
after changing signedness.
(fold_truth_andor_for_ifcombine): Check extension bits in
constants before clipping.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118456
* gcc.dg/field-merge-21.c: New.
* gcc.dg/field-merge-22.c: New.

RISC-V: Fix vsetvl compatibility predicate [PR118154].

In PR118154 we emit strided stores but the first of those does not
always have the proper VTYPE.  That's because we erroneously delete
a necessary vsetvl.

In order to determine whether to elide

(1)
      Expr[7]: VALID (insn 116, bb 17)
        Demand fields: demand_ratio_and_ge_sew demand_avl
        SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(reg:DI 0 zero)

when e.g.

(2)
      Expr[3]: VALID (insn 360, bb 15)
        Demand fields: demand_sew_lmul demand_avl
        SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(reg:DI 0 zero)
        VL=(reg:DI 13 a3 [345])

is already available, we use
sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p.

(1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8.  (2) has ratio = 64,
though, so we cannot directly elide (1).

This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p.

PR target/118154

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define.
(pre_vsetvl::earliest_fuse_vsetvl_info): Use.
(pre_vsetvl::pre_global_vsetvl_info): New predicate with equal
ratio.
* config/riscv/riscv-vsetvl.def: Use.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.

match: Keep conditional in simplification to constant [PR118140].

In PR118140 we simplify

  _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);

to 1:

Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1

when _46 == 1.  This happens by removing the conditional and applying
a | 1 = 1.  Normally we re-introduce the conditional and its else value
if needed but that does not happen here as we're not dealing with a
vector type.  For correctness's sake, we must not remove the conditional
even for non-vector types.

This patch re-introduces a COND_EXPR in such cases.  For PR118140 this
result in a non-vectorized loop.

PR middle-end/118140

gcc/ChangeLog:

* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
COND_EXPR when we simplified to a scalar gimple value but still
have an else value.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr118140.c: New test.
* gcc.target/riscv/rvv/autovec/pr118140.c: New test.

c++/modules: Don't emit imported deduction guides [PR117397]

The ICE in the linked PR is caused because name lookup finds duplicate
copies of the deduction guides, causing a checking assert to fail.

This is ultimately because we're exporting an imported guide; when name
lookup processes 'dguide-5_b.H' it goes via the 'tt_entity' path and
just returns the entity from 'dguide-5_a.H'.  Because this doesn't ever
go through 'key_mergeable' we never set 'BINDING_VECTOR_GLOBAL_DUPS_P'
and so deduping is not engaged, allowing duplicate results.

Currently I believe this to be a perculiarity of the ANY_REACHABLE
handling for deduction guides; in no other case that I can find do we
emit bindings purely to imported entities.  As such, this patch fixes
this problem from that end, by ensuring that we simply do not emit any
imported deduction guides.  This avoids the ICE because no duplicates
need deduping to start with, and should otherwise have no functional
change because lookup of deduction guides will look at all reachable
modules (exported or not) regardless.

Since we're now deliberately not emitting imported deduction guides we
can use LOOK_want::NORMAL instead of LOOK_want::ANY_REACHABLE, since the
extra work to find as-yet undiscovered deduction guides in transitive
importers is not necessary here.

PR c++/117397

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_deduction_guides): Don't emit
imported deduction guides.
(depset::hash::finalize_dependencies): Add check for any
bindings referring to imported entities.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dguide-5_a.H: New test.
* g++.dg/modules/dguide-5_b.H: New test.
* g++.dg/modules/dguide-5_c.H: New test.
* g++.dg/modules/dguide-6.h: New test.
* g++.dg/modules/dguide-6_a.C: New test.
* g++.dg/modules/dguide-6_b.C: New test.
* g++.dg/modules/dguide-6_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

Ada: add missing support for the S/390 and RISC-V architectures

...to the object file reader present in the run-time library.

gcc/ada/
PR ada/118459
* libgnat/s-objrea.ads (Object_Arch): Add S390 and RISCV.
* libgnat/s-objrea.adb (EM_S390): New named number.
(EM_RISCV): Likewise.
(ELF_Ops.Initialize): Deal with EM_S390 and EM_RISCV.
(Read_Address): Deal with S390 and RISCV.

tree-optimization/118405 - ICE with vector(1) T vs T load

When vectorizing a load we are now checking alignment before emitting
a vector(1) T load instead of blindly assuming it's OK when we had
a scalar T load. For reasons we're not handling alignment computation
optimally here but we shouldn't ICE when we fall back to loads of T.

The following ensures the IL remains correct by emitting VIEW_CONVERT
from T to vector(1) T when needed. It also removes an earlier fix
done in r9-382-gbb4e47476537f6 for the same issue with VMAT_ELEMENTWISE.

PR tree-optimization/118405
* tree-vect-stmts.cc (vectorizable_load): When we fall back
to scalar loads make sure we properly convert to vector(1) T
when there was only a single vector element.

Fortran: Add LOCALITY support for DO_CONCURRENT

This patch provided by Anuj Mohite as part of the GSoC project.
It is modified slightly by Jerry DeLisle for minor formatting.
The patch provides front-end parsing of the LOCALITY specs in
DO_CONCURRENT and adds numerous test cases.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_code_node): Updated to use
c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
* frontend-passes.cc (index_interchange): Updated to
use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
(gfc_code_walker): Likewise.
* gfortran.h (enum locality_type): Added new enum for locality types
in DO CONCURRENT constructs.
* match.cc (match_simple_forall): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
(gfc_match_forall): Likewise.
(gfc_match_do): Implemented support for matching DO CONCURRENT locality
specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
* parse.cc (parse_do_block): Updated to use
new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
* resolve.cc (struct check_default_none_data): Added struct
check_default_none_data.
(do_concur_locality_specs_f2023): New function to check compliance
with F2023's C1133 constraint for DO CONCURRENT.
(check_default_none_expr): New function to check DEFAULT(NONE)
compliance.
(resolve_locality_spec): New function to resolve locality specs.
(gfc_count_forall_iterators): Updated to use
code->ext.concur.forall_iterator.
(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
* st.cc (gfc_free_statement): Updated to free locality specifications
and use p->ext.concur.forall_iterator.
* trans-stmt.cc (gfc_trans_forall_1): Updated to use
code->ext.concur.forall_iterator.

gcc/testsuite/ChangeLog:

* gfortran.dg/do_concurrent_10.f90: New test.
* gfortran.dg/do_concurrent_8_f2018.f90: New test.
* gfortran.dg/do_concurrent_8_f2023.f90: New test.
* gfortran.dg/do_concurrent_9.f90: New test.
* gfortran.dg/do_concurrent_all_clauses.f90: New test.
* gfortran.dg/do_concurrent_basic.f90: New test.
* gfortran.dg/do_concurrent_constraints.f90: New test.
* gfortran.dg/do_concurrent_local_init.f90: New test.
* gfortran.dg/do_concurrent_locality_specs.f90: New test.
* gfortran.dg/do_concurrent_multiple_reduce.f90: New test.
* gfortran.dg/do_concurrent_nested.f90: New test.
* gfortran.dg/do_concurrent_parser.f90: New test.
* gfortran.dg/do_concurrent_reduce_max.f90: New test.
* gfortran.dg/do_concurrent_reduce_sum.f90: New test.
* gfortran.dg/do_concurrent_shared.f90: New test.

Signed-off-by: Anuj <anujmohite001@gmail.com>

c: improve UX for -Wincompatible-pointer-types (v3) [PR116871]

PR c/116871 notes that our diagnostics about incompatible function types
could be improved.

In particular, for the case of migrating to C23 I'm seeing a lot of
build failures with signal handlers similar to this (simplified from
alsa-tools-1.2.11, envy24control/profiles.c; see rhbz#2336278):

typedef void (*__sighandler_t) (int);

extern __sighandler_t signal (int __sig, __sighandler_t __handler)
     __attribute__ ((__nothrow__ , __leaf__));

void new_process(void)
{
  void (*int_stat)();

  int_stat = signal(2,  ((__sighandler_t) 1));

  signal(2, int_stat);
}

Before this patch, cc1 fails with this message:

t.c: In function 'new_process':
t.c:18:12: error: assignment to 'void (*)(void)' from incompatible pointer type '__sighandler_t' {aka 'void (*)(int)'} [-Wincompatible-pointer-types]
   18 |   int_stat = signal(2,  ((__sighandler_t) 1));
      |            ^
t.c:20:13: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
   20 |   signal(2, int_stat);
      |             ^~~~~~~~
      |             |
      |             void (*)(void)
t.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'void (*)(void)'
   11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
      |                                          ~~~~~~~~~~~~~~~^~~~~~~~~

With this patch, cc1 emits:

t.c: In function 'new_process':
t.c:18:12: error: assignment to 'void (*)(void)' from incompatible pointer type '__sighandler_t' {aka 'void (*)(int)'} [-Wincompatible-pointer-types]
   18 |   int_stat = signal(2,  ((__sighandler_t) 1));
      |            ^
t.c:9:16: note: '__sighandler_t' declared here
    9 | typedef void (*__sighandler_t) (int);
      |                ^~~~~~~~~~~~~~
t.c:20:13: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
   20 |   signal(2, int_stat);
      |             ^~~~~~~~
      |             |
      |             void (*)(void)
t.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'void (*)(void)'
   11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
      |                                          ~~~~~~~~~~~~~~~^~~~~~~~~
t.c:9:16: note: '__sighandler_t' declared here
    9 | typedef void (*__sighandler_t) (int);
      |                ^~~~~~~~~~~~~~

showing the location of the pertinent typedef ("__sighandler_t")

Another example, simplfied from a52dec-0.7.4: src/a52dec.c
(rhbz#2336013):

typedef void (*__sighandler_t) (int);

extern __sighandler_t signal (int __sig, __sighandler_t __handler)
     __attribute__ ((__nothrow__ , __leaf__));

/* Mismatching return type.  */
static RETSIGTYPE signal_handler (int sig)
{
}

static void print_fps (int final)
{
  signal (42, signal_handler);
}

Before this patch, cc1 emits:

t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
   22 |   signal (42, signal_handler);
      |               ^~~~~~~~~~~~~~
      |               |
      |               int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
   11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
      |                                          ~~~~~~~~~~~~~~~^~~~~~~~~

With this patch cc1 emits:

t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
   22 |   signal (42, signal_handler);
      |               ^~~~~~~~~~~~~~
      |               |
      |               int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
   11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
      |                                          ~~~~~~~~~~~~~~~^~~~~~~~~
t2.c:16:19: note: 'signal_handler' declared here
   16 | static RETSIGTYPE signal_handler (int sig)
      |                   ^~~~~~~~~~~~~~
t2.c:9:16: note: '__sighandler_t' declared here
    9 | typedef void (*__sighandler_t) (int);
      |                ^~~~~~~~~~~~~~

showing the location of the pertinent fndecl ("signal_handler"), and,
as before, the pertinent typedef.

The patch also updates the colorization in the messages to visually
link and contrast the different types and typedefs.

My hope is that this make it easier for users to decipher build failures
seen with the new C23 default.

Further improvements could be made to colorization in
convert_for_assignment, and similar improvements to C++, but I'm punting
those to GCC 16.

gcc/c/ChangeLog:
PR c/116871
* c-typeck.cc (pedwarn_permerror_init): Return bool for whether a
warning was emitted.  Only call print_spelling if we warned.
(pedwarn_init): Return bool for whether a warning was emitted.
(permerror_init): Likewise.
(warning_init): Return bool for whether a
warning was emitted.  Only call print_spelling if we warned.
(class pp_element_quoted_decl): New.
(maybe_inform_typedef_location): New.
(convert_for_assignment): For OPT_Wincompatible_pointer_types,
move auto_diagnostic_group to cover all cases.  Use %e and
pp_element rather than %qT and tree to colorize the types.
Capture whether a warning was emitted, and, if it was,
show various notes: for a pointer to a function, show the
function decl, for typedef types, and show the decls.

gcc/testsuite/ChangeLog:
PR c/116871
* gcc.dg/c23-mismatching-fn-ptr-a52dec.c: New test.
* gcc.dg/c23-mismatching-fn-ptr-alsatools.c: New test.
* gcc.dg/c23-mismatching-fn-ptr.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: Add support for vec_dup to constexpr [PR118445]

With the addition of supporting operations on the SVE scalable vector types,
the vec_duplicate tree will show up in expressions and the constexpr handling
was not done for this tree code.
This is a simple fix to treat VEC_DUPLICATE like any other unary operator and allows
the constexpr-add-1.C testcase to work.

Built and tested for aarch64-linux-gnu.

PR c++/118445

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Handle VEC_DUPLICATE like
a "normal" unary operator.
(potential_constant_expression_1): Likewise.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/constexpr-add-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: Expand shift count in Xmode in interleave pattern.

Hi,

currently ssa-dse-1.C ICEs because expand_simple_binop returns NULL
when building the scalar that is used to IOR two interleaving
sequences.

That's because we try to emit a shift in HImode. This patch shifts in
Xmode and then lowpart-subregs the result to HImode.

Regtested on rv64gcv_zvl512b.

Regards
Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Shift in Xmode.

rs6000: Add clobber and guard for vsx_stxvd2x4_le_const [PR116030]

Previously, vsx_stxvd2x4_le_const_<mode> was introduced for 'split1' pass,
so it is guarded by "can_create_pseudo_p ()". While it would be possible
to match the pattern of this insn during/after RA, this insn could be
updated to make it work for split pass after RA.

And this insn would not be the best choice if the address has alignment like
"&(-16)", so "!altivec_indexed_or_indirect_operand" is added to guard this insn.

2025-01-13 Jiufu Guo <guojiufu@linux.ibm.com>

gcc/
PR target/116030
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_<mode>): Add clobber
and guard with !altivec_indexed_or_indirect_operand.

gcc/testsuite/
PR target/116030
* gcc.target/powerpc/pr116030.c: New test.

Daily bump.

RISC-V: Disallow negative step for interleaving [PR117682]

Hi,

in PR117682 we build an interleaving pattern

  { 1, 201, 209, 25, 161, 105, 113, 185, 65, 9,
    17, 89, 225, 169, 177, 249, 129, 73, 81, 153,
    33, 233, 241, 57, 193, 137, 145, 217, 97, 41,
    49, 121 };

with negative step expecting wraparound semantics due to -fwrapv.

For building interleaved patterns we have an optimization that
does e.g.
  {1, 209, ...} = { 1, 0, 209, 0, ...}
and
  {201, 25, ...} >> 8 = { 0, 201, 0, 25, ...}
and IORs those.

The optimization only works if the lowpart bits are zero.  When
overflowing e.g. with a negative step we cannot guarantee this.

This patch makes us fall back to the generic merge handling for negative
steps.

I'm not 100% certain we're good even for positive steps.  If the
step or the vector length is large enough we'd still overflow and
have non-zero lower bits.  I haven't seen this happen during my
testing, though and the patch doesn't make things worse, so...

Regtested on rv64gcv_zvl512b.  Let's see what the CI says.

Regards
Robin

PR target/117682

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fall back to
merging if either step is negative.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117682.c: New test.

RISC-V: testsuite: Skip test with -flto

Hi,

the zbb-rol-ror and stack_save_restore tests use the -fno-lto option and
scan the final assembly.  For an invocation like -flto ... -fno-lto the
output file we scan is still something like
  zbb-rol-ror-09.ltrans0.ltrans.s.

Therefore skip the tests when "-flto" is present.  This gets rid
of a few UNRESOLVED tests.

Regtested on rv64gcv_zvl512b.  Going to push if the CI agrees.

Regards
Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_1.c: Skip for -flto.
* gcc.target/riscv/stack_save_restore_2.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-04.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-05.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-06.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-07.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-08.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-09.c: Ditto.

PR modula2/116557: Remove physical address from COPYING.FDL

This patch removes the physical address from the COPYING.FDL and
replaces it with a URL.

gcc/m2/ChangeLog:

PR modula2/116557
* COPYING.FDL: Remove physical address and replace with a URL.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Fix typos in show_attr.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_attr): Fix typos for in_equivalence.

RISC-V: Remove zba check in bitwise and ashift reassociation [PR 115921]

The test case

    long
    test (long x, long y)
    {
      return ((x | 0x1ff) << 3) + y;
    }

is now compiled (-O2 -march=rv64g_zba) to

    li      a4,4096
    slliw   a5,a0,3
    addi    a4,a4,-8
    or      a5,a5,a4
    addw    a0,a5,a1
    ret

Despite this check was originally intended to use zba better, now
removing it actually enables the use of zba for this test case (thanks
to late combine):

    ori    a5,a0,511
    sh3add  a0,a5,a1
    ret

Obviously, bitmanip.md does not cover
(any_or (ashift (reg) (imm123)) imm) at all, and even for and it just
seems more natural splitting to (ashift (and (reg) (imm')) (imm123))
first, then let late combine to combine the outer ashift and the plus.

I've not found any test case regressed by the removal.
And "make check-gcc RUNTESTFLAGS=riscv.exp='zba-*.c'" also reports no
failure.

gcc/ChangeLog:

PR target/115921
* config/riscv/riscv.md (<optab>_shift_reverse): Remove
check for TARGET_ZBA.

gcc/testsuite/ChangeLog:

PR target/115921
* gcc.target/riscv/zba-shNadd-08.c: New test.

libphobos: Bump soname to version 6 [PR117701]

Each major release is not binary compatible with the previous.

PR d/117701

libphobos/ChangeLog:

* configure: Regenerate.
* configure.ac (libtool_VERSION): Update to 6:0:0.

Fix build for STORE_FLAG_VALUE<0 targets [PR118418]

In g:06c4cf398947b53b4bfc65752f9f879bb2d07924 I mishandled signed
comparisons of comparison results on STORE_FLAG_VALUE < 0 targets
(despite specifically referencing STORE_FLAG_VALUE in the commit
message). There, (lt TRUE FALSE) is true, although (ltu FALSE TRUE)
still holds.

Things get messy with vector modes, and since those weren't the focus
of the original commit, it seemed better to punt on them for now.
However, punting means that this optimisation no longer feels like
a natural tail-call operation. The patch therefore converts
"return simplify..." to the usual call-and-conditional-return pattern.

gcc/
PR target/118418
* simplify-rtx.cc (simplify_context::simplify_relational_operation_1):
Take STORE_FLAG_VALUE into account when handling signed comparisons
of comparison results.

RISC-V: Improve bitwise and ashift reassociation for single-bit immediate without zbs [PR 115921]

When zbs is not available, there's nothing special with single-bit
immediates and we should perform reassociation as normal immediates.

gcc/ChangeLog:

PR target/115921
* config/riscv/riscv.md (<optab>_shift_reverse): Only check
popcount_hwi if !TARGET_ZBS.

RISC-V: Fix the result error caused by not updating ratio when using "use_max_sew" to merge vsetvl

When the vsetvl instructions of the two RVV instructions are merged
using "use_max_sew", it is possible to update the sew of prev if
prev.sew < next.sew, but keep the original ratio, which is obviously
wrong. when the subsequent instructions are equal to the wrong ratio,
it is possible to generate the wrong "vsetvli zero,zero" instruction,
which will lead to unknown avl.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (demand_system::use_max_sew): Also
set the ratio for PREV.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-10.c: New test.

RISC-V: fix thinko in riscv_register_move_cost ()

This seeming benign mistake caused a massive SPEC2017 Cactu regression
(2.1 trillion insn to 2.5 trillion) wiping out all the gains from my
recent sched1 improvement. Thankfully the issue was trivial to fix even
if hard to isolate.

On BPI3:

Before bug
----------
|  Performance counter stats for './cactusBSSN_r_base-1':
|
|       4,557,471.02 msec task-clock:u                     #    1.000 CPUs utilized
|              1,245      context-switches:u               #    0.273 /sec
|                  1      cpu-migrations:u                 #    0.000 /sec
|            205,376      page-faults:u                    #   45.064 /sec
|  7,291,944,801,307      cycles:u                         #    1.600 GHz
|  2,134,835,735,951      instructions:u                   #    0.29  insn per cycle
|     10,799,296,738      branches:u                       #    2.370 M/sec
|         15,308,966      branch-misses:u                  #    0.14% of all branches
|
|     4557.710508078 seconds time elapsed

Bug
---
|  Performance counter stats for './cactusBSSN_r_base-2':
|
|       4,801,813.79 msec task-clock:u                     #    1.000 CPUs utilized
|              8,066      context-switches:u               #    1.680 /sec
|                  1      cpu-migrations:u                 #    0.000 /sec
|            203,836      page-faults:u                    #   42.450 /sec
|  7,682,826,638,790      cycles:u                         #    1.600 GHz
|  2,503,133,291,344      instructions:u                   #    0.33  insn per cycle
   ^^^^^^^^^^^^^^^^^
|     10,799,287,796      branches:u                       #    2.249 M/sec
|         16,641,200      branch-misses:u                  #    0.15% of all branches
|
|     4802.616638386 seconds time elapsed
|

Fix
---
|  Performance counter stats for './cactusBSSN_r_base-3':
|
|       4,556,170.75 msec task-clock:u                     #    1.000 CPUs utilized
|              1,739      context-switches:u               #    0.382 /sec
|                  0      cpu-migrations:u                 #    0.000 /sec
|            203,458      page-faults:u                    #   44.655 /sec
|  7,289,854,613,923      cycles:u                         #    1.600 GHz
|  2,134,854,070,916      instructions:u                   #    0.29  insn per cycle
|     10,799,296,807      branches:u                       #    2.370 M/sec
|         15,403,357      branch-misses:u                  #    0.14% of all branches
|
|     4556.445490123 seconds time elapsed

Fixes: 46888571d242 ("RISC-V: Add cr and cf constraint")
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_register_move_cost): Remove buggy
check.

Accept commas between clauses in OpenMP declare variant

Add support to the Fortran parser for the OpenMP syntax that allows a comma
after the directive name and between clauses of declare variant. The C and C++
parsers already support this syntax so only a new test is added.

gcc/fortran/ChangeLog:

* openmp.cc (gfc_match_omp_declare_variant): Match comma after directive
name and between clauses. Emit more useful diagnostics.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-2.f90: Remove error test for a comma
after the directive name. Add tests for other invalid syntaxes (extra
comma and invalid clause).
* c-c++-common/gomp/adjust-args-5.c: New test.
* gfortran.dg/gomp/adjust-args-11.f90: New test.

RISC-V: Fix program logic errors caused by data truncation on 32-bit host for zbs, such as i386

Correct logic on 64-bit host:
        ...
        bseti   a5,zero,38
        bseti   a5,a5,63
        addi    a5,a5,-1
        and     a4,a4,a5
...

Wrong logic on 32-bit host:
...
        li      a5,64
        bseti   a5,a5,31
        addi    a5,a5,-1
        and     a4,a4,a5
...

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Change
1UL/1ULL to HOST_WIDE_INT_1U.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bug.c: New test.

Add missing target directive in OpenMP dispatch Fortran runtime test

Without the target directive, the test would run on the host but still try to
use device pointers, which causes a segfault.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/dispatch-1.f90: Add missing target
directive.