git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: The expect framework might introduce CR in output

When running tests using the "sim" config, the command is launched in
non-readonly mode and the text retrieved from the expect command will
then replace all LF with CRLF. (The problem can be found in sim_load
where it calls remote_spawn without an input file).

libstdc++-v3/ChangeLog:

* testsuite/27_io/print/1.cc: Allow both LF and CRLF in test.
* testsuite/27_io/print/3.cc: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: libstdc++: Use effective-target libatomic

Test assumes libatomic.a is always available, but for some embedded
targets, there is no libatomic.a and the test thus fail.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic_float/compare_exchange_padding.cc:
Use effective-target libatomic_available.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303

This is a part of PR118303. It fixes
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors)
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, line 62)
for targets where the parameter on that line is subject to
TARGET_CALLEE_COPIES being true.

c-family:
PR middle-end/118303
* c-pretty-print.cc (c_pretty_printer::primary_expression) <SSA_NAME>:
Call primary_expression for all SSA_NAME_VAR nodes and instead move the
DECL_ARTIFICIAL private name stripping to...
(pp_c_tree_decl_identifier): ...here.

final: Fix get_attr_length for asm goto [PR118411]

The problem is for inline-asm goto, the outer rtl insn type
is a jump_insn and get_attr_length does not handle ASM specially
unlike if the outer rtl insn type was just insn.

This fixes the issue by adding support for both CALL_INSN and JUMP_INSN
with asm.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/118411

gcc/ChangeLog:

* final.cc (get_attr_length_1): Handle asm for CALL_INSN
and JUMP_INSNs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

d: Merge upstream dmd, druntime 82a5d2a7c4, phobos dbc09d823

D front-end changes:

- Import latest fixes from dmd v2.110.0-beta.1.
- Added traits `getBitfieldOffset' and `getBitfieldWidth'.
- Added trait `isCOMClass' to detect if a type is a COM class.
- Added `-fpreview=safer` which enables safety checking on
unattributed functions.

D runtime changes:

- Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

- Import latest fixes from phobos v2.110.0-beta.1.
- Added `fromHexString' and `fromHexStringAsRange' functions to
`std.digest'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 82a5d2a7c4.
* d-lang.cc (d_handle_option): Handle new option `-fpreview=safer'.
* expr.cc (ExprVisitor::NewExp): Remove gcc_unreachable for the
generation of `_d_newThrowable'.
* lang.opt: Add -fpreview=safer.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 82a5d2a7c4.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/gc/blkcache.d, core/internal/gc/blockmeta.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos dbc09d823.

libphobos: Merge upstream phobos 2a730adc0

Phobos changes:

- `std.uni' has been upgraded from Unicode 15.1.0 to 16.0.0.

libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos 2a730adc0.

c++/modules: Handle chaining already-imported local types [PR114630]

In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.

It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.

So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.

A couple of suggested implementations from the discussion on the PR that
don't work:

- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
  doesn't handle the case where type definitions are followed by regular
  local variables, since those won't have been imported as separate
  backreferences and so the chains will diverge.

- Correcting the purviewness of GMF template instantiations to force Y
  to emit copies of the local types rather than backreferences into X is
  insufficient, as it's still possible that the local types got streamed
  in a separate cluster to the function definition, and so will be again
  referred to via regular backreferences when importing.

- Likewise, preventing the emission of function definitions where an
  import has already provided that same definition also is insufficient,
  for much the same reason.

PR c++/114630

gcc/cp/ChangeLog:

* module.cc (trees_in::core_vals) <BLOCK>: Chain a new node if
DECL_CHAIN already is set.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'

The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.

Additionally, it adds support for the 'interop' clause of OpenMP's
'dispatch' directive. As the argument order matters,
gfc_match_omp_variable_list gained a 'reverse_order' flag to use the
same order as the C/C++ parser.

gcc/fortran/ChangeLog:

* gfortran.h: Add OMP_LIST_INTEROP to the unnamed OMP_LIST_ enum.
* openmp.cc (gfc_match_omp_variable_list): Add reverse_order
boolean argument, defaulting to false.
(enum omp_mask2, OMP_DISPATCH_CLAUSES): Add OMP_CLAUSE_INTEROP.
(gfc_match_omp_clauses, resolve_omp_clauses): Handle dispatch's
'interop' clause.
* trans-decl.cc (gfc_get_extern_function_decl): Use sym->declared_at
instead input_location as DECL_SOURCE_LOCATION.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_INTEROP.

gcc/testsuite/ChangeLog:

* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Update
xfail'ed 'dg-bogus' for the better 'declared here' location.
* gfortran.dg/gomp/dispatch-11.f90: New test.
* gfortran.dg/gomp/dispatch-12.f90: New test.

Fortran: Fix error recovery for bad component arrayspecs [PR108434]

2025-01-11 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran/
PR fortran/108434
* class.cc (generate_finalization_wrapper): To avoid memory
leaks from callocs, return immediately if the derived type
error flag is set.
* decl.cc (build_struct): If the declaration of a derived type
or class component does not have a deferred arrayspec, correct,
set the error flag of the derived type and emit an immediate
error.

gcc/testsuite/
PR fortran/108434
* gfortran.dg/pr108434.f90 : Add tests from comment 1.

c++: modules and function attributes

30_threads/stop_token/stop_source/109339.cc was failing because we weren't
representing attribute access on the METHOD_TYPE for _Stop_state_ref.

The modules code expected attributes to appear on tt_variant_type and not
on tt_derived_type, but that's backwards since build_type_attribute_variant
gives a type with attributes its own TYPE_MAIN_VARIANT.

gcc/cp/ChangeLog:

* module.cc (trees_out::type_node): Write attributes for
tt_derived_type, not tt_variant_type.
(trees_in::tree_node): Likewise for reading.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-2_a.C: New test.
* g++.dg/modules/attrib-2_b.C: New test.

c++: modules and class attributes

std/time/traits/is_clock.cc was getting a warning about applying the
deprecated attribute to a variant of auto_ptr, which was wrong because it's
on the primary type. This turned out to be because we were ignoring the
attributes on the definition of auto_ptr because the forward declaration in
unique_ptr.h has no attributes. We need to merge attributes as usual in a
redeclaration.

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Merge attributes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-1_a.C: New test.
* g++.dg/modules/attrib-1_b.C: New test.

LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: Not generate ">>".

d: Merge dmd, druntime 2b89c2909d, phobos bdedad3bf

D front-end changes:

        - Import latest fixes from dmd v2.110.0-beta.1.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.
- Added `popGrapheme' function to `std.uni'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 2b89c2909d.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/basicmangle.o to
d/mangle-basic.o, d/cppmangle.o to d/mangle-cpp.o, and d/dmangle.o to
d/mangle-package.o.
(d/mangle-%.o): New rule.
* d-builtins.cc (maybe_set_builtin_1): Update for new front-end
interface.
* d-diagnostic.cc (verrorReport): Likewise.
(verrorReportSupplemental): Likewise.
* d-frontend.cc (getTypeInfoType): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Likewise.
* d-target.cc (TargetC::contributesToAggregateAlignment): New.
* d-tree.h (create_typeinfo): Adjust prototype.
* decl.cc (layout_struct_initializer): Update for new front-end
interface.
* typeinfo.cc (create_typeinfo): Remove generate parameter.
* types.cc (layout_aggregate_members): Update for new front-end
interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 2b89c2909d.
* src/MERGE: Merge upstream phobos bdedad3bf.

Use relations when simplifying MIN and MAX.

Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.

Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).

PR tree-optimization/88575
gcc/
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
relation between op0 and op1 and utilize it.
(simplify_using_ranges::simplify): Do not eliminate float checks.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-27.c: Disable VRP.
* gcc.dg/tree-ssa/minmax-27e.c: New.
* gcc.dg/tree-ssa/minmax-27f.c: New.
* gcc.dg/tree-ssa/minmax-28.c: Disable VRP.
* gcc.dg/tree-ssa/minmax-28e.c: New.
* gcc.dg/tree-ssa/minmax-28f.c: New.

Daily bump.

d: Merge dmd, druntime 4ccb01fde5, phobos eab6595ad

D front-end changes:

- Added pragma for ImportC to allow setting `nothrow', `@nogc'
or `pure'.
- Mixin templates can now use assignment syntax.

D runtime changes:

- Removed `ThreadBase.criticalRegionLock' from `core.thread'.
- Added `expect', `[un]likely', `trap' to `core.builtins'.

Phobos changes:

- Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4ccb01fde5.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/foreachvar.o to
d/visitor-foreachvar.o, d/visitor.o to d/visitor-package.o, and
d/statement_rewrite_walker.o to d/visitor-statement_rewrite_walker.o.
(D_FRONTEND_OBJS): Rename
d/{parsetime,permissive,postorder,transitive}visitor.o to
d/visitor-{parsetime,permissive,postorder,transitive}.o.
(D_FRONTEND_OBJS): Remove d/sapply.o.
(d.tags): Add dmd/common/*.h.
(d/visitor-%.o:): New rule.
* d-codegen.cc (get_frameinfo): Update for new front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4ccb01fde5.
* src/MERGE: Merge upstream phobos eab6595ad.

d: Merge dmd, druntime 6884b433d2, phobos 48d581a1f

D front-end changes:

- It's now deprecated to declare `auto ref' parameters without
  putting those two keywords next to each other.
        - An error is now given for case fallthough for multivalued
  cases.
        - An error is now given for constructors with field destructors
  with stricter attributes.
        - An error is now issued for `in'/`out' contracts of `nothrow'
  functions that may throw.
- `auto ref' can now be applied to local, static, extern, and
  global variables.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 6884b433d2.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
(d_build_builtins_module): Likewise.
(matches_builtin_type): Likewise.
(covariant_with_builtin_type_p): Likewise.
* d-codegen.cc (lower_struct_comparison): Likewise.
(call_side_effect_free_p): Likewise.
* d-compiler.cc (Compiler::paintAsType): Likewise.
* d-convert.cc (convert_expr): Likewise.
(convert_for_assignment): Likewise.
* d-target.cc (Target::isVectorTypeSupported): Likewise.
(Target::isVectorOpSupported): Likewise.
(Target::isReturnOnStack): Likewise.
* decl.cc (get_symbol_decl): Likewise.
* expr.cc (build_return_dtor): Likewise.
* imports.cc (class ImportVisitor): Likewise.
* toir.cc (class IRVisitor): Likewise.
* types.cc (class TypeVisitor): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 6884b433d2.
* src/MERGE: Merge upstream phobos 48d581a1f.

vect: Also cost gconds for scalar [PR118211]

Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.

This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Don't skip over gconds.

vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211]

This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard.  Specifically
in the case where the loop niters are unknown at compile time we used to
check:

  !LOOP_REQUIRES_VERSIONING (loop_vinfo)

but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling.  I think
this condition should instead be checking specifically if we aren't
versioning for alignment.

As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.

With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/vect/vect-early-break_130.c: New test.

vect: Fix dominators when adding a guard to skip the vector loop [PR118211]

The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance. This patch fixes that.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge. Initialize prolog variable to
NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* g++.dg/vect/vect-early-break_6.cc: New test.

Co-Authored-By: Alex Coplan <alex.coplan@arm.com>

vect: Don't guard scalar epilogue for inverted loops [PR118211]

For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
epilogue guard for inverted early-exit loops.

vect: Force alignment peeling to vectorize more early break loops [PR118211]

This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.

To make this work for VLA architectures we have to allow compile-time
non-constant target alignments. We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization. Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize.
* gcc.dg/tree-ssa/cunroll-14.c: Likewise.
* gcc.dg/unroll-6.c: Likewise.
* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
* gcc.dg/vect/vect-104.c: Expect to vectorize.
* gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise.
* gcc.dg/vect/vect-early-break_3.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-early-break_8.c: Likewise.
* gfortran.dg/vect/vect-5.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gcc.dg/vect/vect-switch-search-line-fast.c:

Co-Authored-By: Tamar Christina <tamar.christina@arm.com>

AArch64: correct Cortex-X4 MIDR

The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the
parts number for a Cortex-A720 (which does have the right number).

The correct number can be found in the Cortex-X4 Technical Reference Manual [1]
on page 382 in Issue Number 5.

[1] https://developer.arm.com/documentation/102484/latest/

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
num.

d: Merge dmd, druntime 34875cd6e1, phobos ebd24da8a

D front-end changes:

        - Import dmd v2.110.0-beta.1.
        - `ref' can now be applied to local, static, extern, and global
  variables.

D runtime changes:

        - Import druntime v2.110.0-beta.1.

Phobos changes:

        - Import phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 34875cd6e1.
* dmd/VERSION: Bump version to v2.110.0-beta.1.
* Make-lang.in (D_FRONTEND_OBJS): Add d/deps.o, d/timetrace.o.
* decl.cc (class DeclVisitor): Update for new front-end interface.
* expr.cc (class ExprVisitor): Likewise
* typeinfo.cc (check_typeinfo_type): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 34875cd6e1.
* src/MERGE: Merge upstream phobos ebd24da8a.

libstdc++: Fix unused parameter warnings in <bits/atomic_futex.h>

This fixes warnings like the following during bootstrap:

sparc-sun-solaris2.11/libstdc++-v3/include/bits/atomic_futex.h:324:53: warning: unused parameter ‘__mo’ [-Wunused-parameter]
324 | _M_load_when_equal(unsigned __val, memory_order __mo)
| ~~~~~~~~~~~~~^~~~

libstdc++-v3/ChangeLog:

* include/bits/atomic_futex.h (__atomic_futex_unsigned): Remove
names of unused parameters in non-futex implementation.

c++: add fixed test [PR118391]

Fixed by r15-6740.

PR c++/118391

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval20.C: New test.

libatomic: Cleanup AArch64 ifunc selection

Simplify and cleanup ifunc selection logic. Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.

Passes regress and bootstrap, OK for commit?

libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
(has_lse128): Likewise.
(has_rcpc3): Add early check for LSE2.

testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test

Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Do not call cp_parser_omp_dispatch directly in cp_parser_pragma

This is a followup to
ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.

The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
with cp_parser_omp_construct and check context.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/dispatch-8.C: New test.

c++: Fix ICE with invalid defaulted operator <=> [PR118387]

In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
  tree comp = build_new_op (loc, code, flags, lhs, rhs,
                            NULL_TREE, NULL_TREE, &overload,
                            tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
  if (tag == cc_last && is_auto (type))
    {
...
    }

  gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.

The patch fixes it by returning error_mark_node from genericize_spaceship
instead of failing the assertion.

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement.  That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/118387
* method.cc (genericize_spaceship): For tag == cc_last if
type is not auto just return error_mark_node instead of failing
checking assertion.

* g++.dg/cpp2a/spaceship-synth17.C: New test.

c++: modules and DECL_REPLACEABLE_P

We need to remember that the ::operator new is replaceable to avoid a bogus
error about __builtin_operator_new finding a non-replaceable function.

This affected __get_temporary_buffer in stl_tempbuf.h.

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Write replaceable_operator.
(trees_in::core_bools): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/operator-2_a.C: New test.
* g++.dg/modules/operator-2_b.C: New test.

Fix some memory leaks

The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.

* df-core.cc (rest_of_handle_df_finish): Release dflow for
problems without free function (like LR).
* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
Release loop_bbs on all exits.
* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
converts.
(supportable_indirect_convert_operation): Get a reference to
the output vector of converts.

[PR118017][LRA]: Fix test for i686

My previous patch for PR118017 contains a test which fails on i686. The patch fixes this.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: Check target int128.

arm: [MVE intrinsics] Fix tuples field name (PR 118332)

The previous fix only worked for C, for C++ we need to add more
information to the underlying type so that
finish_class_member_access_expr accepts it.

We use the same logic as in aarch64's register_tuple_type for AdvSIMD
tuples.

This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
mode.

gcc/ChangeLog:

PR target/118332
* config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
(register_type_decl): Delete.
(register_builtin_tuple_types): Use
lang_hooks.types.simulate_record_decl.

Fix bootstrap on !HARDREG_PRE_REGNOS targets

Pushed as obvious.

* gcse.cc (pass_hardreg_pre::gate): Wrap possibly unused
fun argument.

rtl-optimization/117467 - limit ext-dce memory use

The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers. The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.

This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.

PR rtl-optimization/117467
PR rtl-optimization/117934
* ext-dce.cc (ext_dce_execute): Do nothing if a memory
allocation estimate exceeds what is allowed by
--param max-gcse-memory.

c++: ICE with pack indexing and partial inst [PR117937]

Here we ICE in expand_expr_real_1:

      if (exp)
        {
          tree context = decl_function_context (exp);
          gcc_assert (SCOPE_FILE_SCOPE_P (context)
                      || context == current_function_decl

on something like this test:

  void
  f (auto... args)
  {
    [&]<size_t... i>(seq<i...>) {
g(args...[i]...);
    }(seq<0>());
  }

because while current_function_decl is:

  f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}]

(correct), context is:

  f<int>(int)::<lambda(seq<i ...>)>

which is only the partial instantiation.

I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it.  The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator().  So we should let retrieve_local_specialization
find the right args#0.

PR c++/117937

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.

s390: Add expander for uaddc/usubc optabs

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.

gcc/testsuite/ChangeLog:

* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.

docs: Document new hardreg PRE pass

gcc/ChangeLog:

* doc/passes.texi: Document hardreg PRE pass.

Add new hardreg PRE pass

This pass is used to optimise assignments to the FPMR register in
aarch64.  I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.

Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later.  This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.

Any uses of the hardreg in debug insns will be deleted.  We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).

gcc/ChangeLog:

* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.

Disable a broken multiversioning optimisation

This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:

1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.

2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time.  A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
   chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
   a higher priority caller version would have been eligble for
   selection (and hence the current caller version wouldn't have been
   selected).

The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.

Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage.  However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.

Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.

gcc/ChangeLog:

* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-pragma.C: New test.

docs: Add new AArch64 flags

gcc/ChangeLog:

* doc/invoke.texi: Add new AArch64 flags.

aarch64: Add new +xs flag

GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
* config/aarch64/aarch64-option-extensions.def (XS): New flag.

aarch64: Add new +wfxt flag

GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.

aarch64: Add new +rcpc2 flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
* config/aarch64/aarch64-option-extensions.def
(RCPC2): New flag.
(RCPC3): Add RCPC2 dependency.
* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
expected feature string instead of rcpc.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +flagm2 flag

GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
expected feature string instead of flagm.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +frintts flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
* config/aarch64/arm_neon.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +jscvt flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +fcma flag

This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.

Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
(SVE): Add FCMA dependency.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_15: Add fcma to Features.
* gcc.target/aarch64/cpunative/info_16: Ditto.
* gcc.target/aarch64/cpunative/info_17: Ditto.
* gcc.target/aarch64/cpunative/info_8: Ditto.
* gcc.target/aarch64/cpunative/info_9: Ditto.

aarch64: Use PAUTH instead of V8_3A in some places

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.

c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]

Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

PR c/118376
* c-parser.cc (c_parser_postfix_expression): Call
set_c_expr_source_range before break in the __builtin_stdc_rotate_*
case.

* gcc.dg/pr118376.c: New test.

rtl: Remove invalid compare simplification [PR117186]

g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
added code to treat:

  (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))

as a nop.  This PR shows that that isn't always correct.
The compare in the set above is between two 0/1 booleans (at least
on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
produced the incoming (reg:CC cc) is unconstrained; it could be between
arbitrary integers, or even floats.  The fold is therefore replacing a
cc that is valid for both signed and unsigned comparisons with one that
is only known to be valid for signed comparisons.

  (gt (compare (gt cc 0) (lt cc 0) 0)

does simplify to:

  (gt cc 0)

but:

  (gtu (compare (gt cc 0) (lt cc 0) 0)

does not simplify to:

  (gtu cc 0)

The optimisation didn't come with a testcase, but it was added for
i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
as it once did, since it's now conditional on -minline-all-stringops.
But the patch is almost 25 years old, so whatever the original
motivation was, it seems likely that other things now rely on it.

It therefore seems better to try to preserve the optimisation on rtl
rather than get rid of it.  To do that, we need to look at how the
result of the outer compare is used.  We'd therefore be looking at four
instructions (the gt, the lt, the compare, and the use of the compare),
but combine already allows that for 3-instruction combinations thanks
to:

  /* If the source is a COMPARE, look for the use of the comparison result
     and try to simplify it unless we already have used undobuf.other_insn.  */

When applied to boolean inputs, a comparison operator is
effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
simplify_logical_relational_operation already had code to simplify
logical operators between two comparison results, but:

* It only handled IOR, which doesn't cover all the cases needed here.
  The others are easily added.

* It treated comparisons of integers as having an ORDERED/UNORDERED result.
  Therefore:

  * it would not treat "true for LT + EQ + GT" as "always true" for
    comparisons between integers, because the mask excluded the UNORDERED
    condition.

  * it would try to convert "true for LT + GT" into LTGT even for comparisons
    between integers.  To prevent an ICE later, the code used:

       /* Many comparison codes are only valid for certain mode classes.  */
       if (!comparison_code_valid_for_mode (code, mode))
         return 0;

    However, this used the wrong mode, since "mode" is here the integer
    result of the comparisons (and the mode of the IOR), not the mode of
    the things being compared.  Thus the effect was to reject all
    floating-point-only codes, even when comparing floats.

  I think instead the code should detect whether the comparison is between
  integer values and remove UNORDERED from consideration if so.  It then
  always produces a valid comparison (or an always true/false result),
  and so comparison_code_valid_for_mode is not needed.  In particular,
  "true for LT + GT" becomes NE for comparisons between integers but
  remains LTGT for comparisons between floats.

* There was a missing check for whether the comparison inputs had
  side effects.

While there, it also seemed worth extending
simplify_logical_relational_operation to unsigned comparisons, since
that makes the testing easier.

As far as that testing goes: the patch exhaustively tests all
combinations of integer comparisons in:

  (cmp1 (cmp2 X Y) (cmp3 X Y))

for the 10 integer comparisons, giving 1000 fold attempts in total.
It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
on the result of the fold, giving 9 checks per fold, or 9000 in total.
That's probably more than is typical for self-tests, but it seems to
complete in neglible time, even for -O0 builds.

gcc/
PR rtl-optimization/117186
* rtl.h (simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.
* simplify-rtx.cc (unsigned_comparison_to_mask): New function.
(mask_to_unsigned_comparison): Likewise.
(comparison_code_valid_for_mode): Delete.
(simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
comparisons.  Handle always-false results.  Ignore the low bit
of the mask if the operands are always ordered and remove the
then-redundant check of comparison_code_valid_for_mode.  Check
for side-effects in the operands before simplifying them away.
(simplify_context::simplify_binary_operation_1): Remove
simplification of (compare (gt ...) (lt ...)) and instead...
(simplify_context::simplify_relational_operation_1): ...handle
comparisons of comparisons here.
(test_comparisons): New function.
(test_scalar_ops): Call it.

gcc/testsuite/
PR rtl-optimization/117186
* gcc.dg/torture/pr117186.c: New test.
* gcc.target/aarch64/pr117186.c: Likewise.

[ifcombine] drop other misuses of uniform_integer_cst_p

As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
ifcombine makes no sense, we're not dealing with vectors. Indeed,
I've been misunderstanding and misusing it since I cut&pasted it from
some preexisting match predicate in earlier version of the ifcombine
field-merge patch.

for gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Drop misuses of
uniform_integer_cst_p.
(fold_truth_andor_for_ifcombine): Likewise.

[ifcombine] fix mask variable test to match use [PR118344]

There was a cut&pasto in the rr_and_mask's adjustment to match the
combined type: the test on whether there was a mask already was
testing the wrong variable, and then it might crash or otherwise fail
accessing an undefined mask.  This only hit with checking enabled,
and rarely at that.

for  gcc/ChangeLog

PR tree-optimization/118344
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
rr_and_mask's type adjustment test.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118344
* gcc.dg/field-merge-19.c: New.

[ifcombine] reuse left-hand mask to decode right-hand xor operand

If fold_truth_andor_for_ifcombine applies a mask to an xor, say
because the result of the xor is compared with a power of two [minus
one], we have to apply the same mask when processing both the left-
and right-hand xor paths for the transformation to be sound.  Arrange
for decode_field_reference to propagate the incoming mask along with
the expression to the right-hand operand.

Don't require the right-hand xor operand to be a constant, that was a
cut&pasto.

for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
Propagate pand_mask to the right-hand xor operand.  Don't
require the right-hand xor operand to be a constant.
(fold_truth_andor_for_ifcombine): Pass right-hand mask when
appropriate.

[ifcombine] adjust for narrowing converts before shifts [PR118206]

A narrowing conversion and a shift both drop bits from the loaded
value, but we need to take into account which one comes first to get
the right number of bits and mask.

Fold when applying masks to parts, comparing the parts, and combining
the results, in the odd chance either mask happens to be zero.

for gcc/ChangeLog

PR tree-optimization/118206
* gimple-fold.cc (decode_field_reference): Account for upper
bits dropped by narrowing conversions whether before or after
a right shift.
(fold_truth_andor_for_ifcombine): Fold masks, compares, and
combined results.

for gcc/testsuite/ChangeLog

PR tree-optimization/118206
* gcc.dg/field-merge-18.c: New.

testsuite: generalized field-merge tests for <32-bit int [PR118025]

Explicitly convert constants to the desired types, so as to not elicit
warnings about implicit truncations, nor execution errors, on targets
whose ints are narrower than 32 bits.

for gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Convert constants to desired types.
* gcc.dg/field-merge-3.c: Likewise.
* gcc.dg/field-merge-4.c: Likewise.
* gcc.dg/field-merge-5.c: Likewise.
* gcc.dg/field-merge-11.c: Likewise.
* gcc.dg/field-merge-17.c: Don't mess with padding bits.

testsuite: generalize ifcombine field-merge tests [PR118025]

A number of tests that check for specific ifcombine transformations
fail on AVR and PRU targets, whose type sizes and alignments aren't
conducive of the expected transformations.  Adjust the expectations.

Most execution tests should run successfully regardless of the
transformations, but a few that could conceivably fail if short and
char have the same bit width now check for that and bypass the tests
that would fail.

Conversely, one test that had such a runtime test, but that would work
regardless, no longer has that runtime test, and its types are
narrowed so that the transformations on 32-bit targets are more likely
to be the same as those that used to take place on 64-bit targets.
This latter change is somewhat obviated by a separate patch, but I've
left it in place anyway.

for  gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Skip BIT_FIELD_REF counting on AVR and PRU.
* gcc.dg/field-merge-3.c: Bypass the test if short doesn't have the
expected size.
* gcc.dg/field-merge-8.c: Likewise.
* gcc.dg/field-merge-9.c: Likewise.  Skip optimization counting on
AVR and PRU.
* gcc.dg/field-merge-13.c: Skip optimization counting on AVR and PRU.
* gcc.dg/field-merge-15.c: Likewise.
* gcc.dg/field-merge-17.c: Likewise.
* gcc.dg/field-merge-16.c: Likewise.  Drop runtime bypass.  Use
smaller types.
* gcc.dg/field-merge-14.c: Add comments.

ifcombine field-merge: improve handling of dwords

On 32-bit hosts, data types with 64-bit alignment aren't getting
treated as desired by ifcombine field-merging: we limit the choice of
modes at BITS_PER_WORD sizes, but when deciding the boundary for a
split, we'd limit the choice only by the alignment, so we wouldn't
even consider a split at an odd 32-bit boundary.  Fix that by limiting
the boundary choice by word choice as well.

Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
data structures unhandled by ifcombine on 32-bit hosts.  We already
need to loading them as double words, and if they're not byte-aligned,
the code gets really ugly, but ifcombine could improve it if it allows
double-word loads as a last resort.  I've added that.

for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
boundary choice by word size as well.  Try aligned double-word
loads as a last resort.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-17.c: New.

ipa-cp: Fold-convert values when necessary (PR 118138)

PR 118138 and quite a few duplicates that it has acquired in a short
time show that even though we are careful to make sure we do not loose
any bits when newly allowing type conversions in jump-functions, we
still need to perform the fold conversions during IPA constant
propagation and not just at the end in order to properly perform
sign-extensions or zero-extensions as appropriate.

This patch does just that, changing a safety predicate we already use
at the appropriate places to return the necessary type.

gcc/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* ipa-cp.cc (ipacp_value_safe_for_type): Return the appropriate
type instead of a bool, accept NULL_TREE VALUEs.
(propagate_vals_across_arith_jfunc): Use the new returned value of
ipacp_value_safe_for_type.
(propagate_vals_across_ancestor): Likewise.
(propagate_scalar_across_jump_function): Likewise.

gcc/testsuite/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* gcc.dg/ipa/pr118138.c: New test.

nvptx: Add '__builtin_frame_address(0)' test case

Documenting the status quo.

gcc/testsuite/
* gcc.target/nvptx/__builtin_frame_address_0-1.c: New.

nvptx: Add '__builtin_stack_address()' test case

Documenting the status quo.

gcc/testsuite/
* gcc.target/nvptx/__builtin_stack_address-1.c: New.

testsuite: arm: Use -std=c17 and effective-target arm_arch_v5te_thumb

With -std=c23, the following errors are now emitted as the function
prototype and implementation does not match:

.../pr59858.c: In function 're_search_internal':
.../pr59858.c:95:17: error: too many arguments to function 'check_matching'
.../pr59858.c:75:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:100:1: error: conflicting types for 'check_matching'; have 'int(re_match_context_t *, int *)'
.../pr59858.c:75:12: note: previous declaration of 'check_matching' with type 'int(void)'
.../pr59858.c: In function 'check_matching':
.../pr59858.c:106:14: error: too many arguments to function 'transit_state'
.../pr59858.c:77:23: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:111:1: error: conflicting types for 'transit_state'; have 're_dfastate_t *(re_match_context_t *, re_dfastate_t *)'
.../pr59858.c:77:23: note: previous declaration of 'transit_state' with type 're_dfastate_t *(void)'
.../pr59858.c: In function 'transit_state':
.../pr59858.c:116:7: error: too many arguments to function 'build_trtable'
.../pr59858.c:79:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:121:1: error: conflicting types for 'build_trtable'; have 'int(const re_dfa_t *, re_dfastate_t *)'
.../pr59858.c:79:12: note: previous declaration of 'build_trtable' with type 'int(void)'

Adding -std=c17 removes these errors.

Also, updated test case to use -mcpu=unset/-march=unset feature
introduced in r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr59858.c: Use -std=c17 and effective-target
arm_arch_v5te_thumb.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

ada: Incorrect accessibilty level for library level subprograms

The patch fixes an issue in the compiler whereby accessibility level
calculations for objects declared witihin library-level subprograms
were done incorrectly - potentially allowing runtime accessibility
checks to spuriously pass.

gcc/ada/ChangeLog:

* accessibility.adb:
(Innermost_master_Scope_Depth): Add special case for expressions
within library level subprograms.

ada: Remove empty line.

gcc/ada/ChangeLog:

* env.h: Remove last empty line.

ada: Set syntactic node properties immediately when crating the nodes

When creating a node, we can directly set its syntactic properties.
Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* contracts.adb (Build_Call_Helper_Decl): Tune whitespace.
* exp_attr.adb (Analyze_Attribute): Set Of_Present while
creating the node; reorder setting Subtype_Indication to match the
syntax order.
* exp_ch3.adb (Build_Equivalent_Aggregate): Likewise for Box_Present
and Expression properties.
* sem_ch12.adb (Analyze_Formal_Derived_Type): Set type properties
when creating the nodes.
* sem_ch3.adb (Check_Anonymous_Access_Component): Likewise.

ada: Turn Is_Effective_Use_Clause from syntactic to semantic flag

For a USE clause being effective is a semantic property, not a syntactic.
AST cleanup; behavior is unaffected.

gcc/ada/ChangeLog:

* gen_il-gen-gen_nodes.adb (Gen_Nodes): Change Is_Effective_Use_Clause
from syntactic to semantic property.

ada: Reorder syntactic node fields to match the Ada RM grammar

Several AST nodes had their syntactic fields in a different order than
specified by the Ada RM grammar. With the variable-size nodes this no longer
had an impact on the AST memory layout and was making the automatically
generated Nmake routines a bit unintuitive to use.

gcc/ada/ChangeLog:

* exp_ch3.adb (Predef_Spec_Or_Body): Add explicit parameter
associations, because now the Empty_List actual parameter would be
confused as being for the Aspect_Specifications formal parameter.
* gen_il-gen-gen_nodes.adb (Gen_Nodes): Reorder syntactic fields.
* sem_util.adb (Declare_Indirect_Temp): Add explicit parameter
association, because now the parameter will be interpreted as a
subpool handle name.

c++: Fix up ICEs on constexpr inline asm strings in templates [PR118277]

The following patch fixes ICEs when the new inline asm syntax
to use C++26 static_assert-like constant expressions in place
of string literals is used in templates.
As finish_asm_stmt doesn't do any checking for
processing_template_decl, this patch also just defers handling
those strings in templates rather than say trying fold_non_dependent_expr
and if the result is non-dependent and usable, try to extract.

The patch also reverts changes to cp_parser_asm_specification_opt
which allowed something like
void foo () asm ((std::string_view ("bar")));
but it would be really hard to support
template <int N>
void baz () asm ((std::string_view ("qux")));
(especially with dependent constant expression).

And the patch adds extensive test coverage for the various errors.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

PR c++/118277
* cp-tree.h (finish_asm_string_expression): Declare.
* semantics.cc (finish_asm_string_expression): New function.
(finish_asm_stmt): Use it.
* parser.cc (cp_parser_asm_string_expression): Likewise.
Wrap string into PAREN_EXPR in the ("") case.
(cp_parser_asm_definition): Don't ICE if finish_asm_stmt
returns error_mark_node.
(cp_parser_asm_specification_opt): Revert 2024-06-24 changes.
* pt.cc (tsubst_stmt): Don't ICE if finish_asm_stmt returns
error_mark_node.

* g++.dg/cpp1z/constexpr-asm-4.C: New test.
* g++.dg/cpp1z/constexpr-asm-5.C: New test.

c++: Fix up modules handling of namespace scope structured bindings

With the following patch I actually get a simple namespace scope structured
binding working with modules.

The core_vals change ensure we actually save/restore DECL_VALUE_EXPR even
for namespace scope vars, the get_merge_kind is based on the assumption
that structured bindings are always unique, one can't redeclare them and
without it we really ICE because their base vars have no name.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

* module.cc (trees_out::core_vals): Note DECL_VALUE_EXPR even for
vars outside of functions.
(trees_in::core_vals): Read in DECL_VALUE_EXPR even for vars outside
of functions.
(trees_out::get_merge_kind): Make DECL_DECOMPOSITION_P MK_unique.

* g++.dg/modules/decomp-2_b.C: New test.
* g++.dg/modules/decomp-2_a.H: New file.

fortran: use_iso_fortran_env_module tweaks [PR118337]

This patch adds a comment to explain why we initialize the non-constant
elts of symbol array separately and checking assert to verify that separate
initialization bumps the iterator for each macro.

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

PR fortran/118337
* module.cc (use_iso_fortran_env_module): Add a comment explaining
the optimization performed.  Add gcc_checking_assert that i was
incremented for all the elements.  Formatting fix.

c++: improve some modules comments

gcc/cp/ChangeLog:

* error.cc (cxx_initialize_diagnostics): Improve comment.
* module.cc (modules): Improve comment.
(get_originating_module): Add function comment.

c++: modules, generic lambda, constexpr if

In std/ranges/concat/1.cc we end up instantiating
concat_view::iterator::operator-, which has nested generic lambdas, where
the innermost is all constexpr if. tsubst_lambda_expr propagates
the returns_* flags for generic lambdas since we might not substitute into
the whole function, as in this case with constexpr if. But the module
wasn't preserving that flag, and so the importer gave a bogus "no return
statement" diagnostic.

gcc/cp/ChangeLog:

* module.cc (trees_out::write_function_def): Write returns* flags.
(struct post_process_data): Add returns_* flags.
(trees_in::read_function_def): Set them.
(module_state::read_cluster): Use them.

gcc/testsuite/ChangeLog:

* g++.dg/modules/constexpr-if-1_a.C: New test.
* g++.dg/modules/constexpr-if-1_b.C: New test.

LoongArch: Opitmize the cost of vec_construct.

When analyzing 525 on LoongArch architecture, it was found that the
for loop of hotspot function x264_pixel_satd_8x4 could not be quantized
256-bit due to the cost of vec_construct setting. After re-adjusting
vec_construct, the performance of 525 program was improved by 16.57%.
It was found that this function can be vectorized on the aarch64 and
x86 architectures, see [PR98138].

Co-Authored-By: Deng Jianbo <dengjianbo@loongson.cn>.
gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Modify the
construction cost of the vec_construct vector.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-slp-two-operator.c: New test.

Daily bump.

RISC-V: testsuite: fix target selector for sync_char_short

The effective-target selector for riscv on sync_char_short did not
check to see if atomics were enabled. As a result, these test cases were
ran on targets without the a extension. Add additional checks for zalrsc
or zabha extensions.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix effective target sync_char_short
for riscv*-*-*

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

AArch64: Fix costing of emulated gathers/scatters [PR118188]

When a target does not support gathers and scatters the vectorizer tries to
emulate these using scalar loads/stores and a reconstruction of vectors from
scalar.

The loads are still marked with VMAT_GATHER_SCATTER to indicate that they are
gather/scatters, however the vectorizer also asks the target to cost the
instruction that generates the indexes for the emulated instructions.

This is done by asking the target to cost vec_to_scalar and vec_construct with
a stmt_vinfo being the VMAT_GATHER_SCATTER.

Since Adv. SIMD does not have an LD1 variant that takes an Adv. SIMD Scalar
element the operation is lowered entirely into a sequence of GPR loads to create
the x registers for the indexes.

At the moment however we don't cost these, and so the vectorizer things that
when it emulates the instructions that it's much cheaper than using an actual
gather/scatter with SVE.  Consider:

#define iterations 100000
#define LEN_1D 32000

float a[LEN_1D], b[LEN_1D];

float
s4115 (int *ip)
{
    float sum = 0.;
    for (int i = 0; i < LEN_1D; i++)
        {
            sum += a[i] * b[ip[i]];
        }
    return sum;
}

which before this patch with -mcpu=<sve-core> generates:

.L2:
        add     x3, x0, x1
        ldrsw   x4, [x0, x1]
        ldrsw   x6, [x3, 4]
        ldpsw   x3, x5, [x3, 8]
        ldr     s1, [x2, x4, lsl 2]
        ldr     s30, [x2, x6, lsl 2]
        ldr     s31, [x2, x5, lsl 2]
        ldr     s29, [x2, x3, lsl 2]
        uzp1    v30.2s, v30.2s, v31.2s
        ldr     q31, [x7, x1]
        add     x1, x1, 16
        uzp1    v1.2s, v1.2s, v29.2s
        zip1    v30.4s, v1.4s, v30.4s
        fmla    v0.4s, v31.4s, v30.4s
        cmp     x1, x8
        bne     .L2

but during costing:

a[i_18] 1 times vector_load costs 4 in body
*_4 1 times unaligned_load (misalign -1) costs 4 in body
b[_5] 4 times vec_to_scalar costs 32 in body
b[_5] 4 times scalar_load costs 16 in body
b[_5] 1 times vec_construct costs 3 in body
_1 * _6 1 times vector_stmt costs 2 in body
_7 + sum_16 1 times scalar_to_vec costs 4 in prologue
_7 + sum_16 1 times vector_stmt costs 2 in epilogue
_7 + sum_16 1 times vec_to_scalar costs 4 in epilogue
_7 + sum_16 1 times vector_stmt costs 2 in body

Here we see that the latency for the vec_to_scalar is very high.  We know the
intermediate vector isn't usable by the target ISA and will always be elided.
However these latencies need to remain high because when costing gather/scatters
IFNs we still pass the nunits of the type along.  In other words, the vectorizer
is still costing vector gather/scatters as scalar load/stores.

Lowering the cost for the emulated gathers would result in emulation being
seemingly cheaper.  So while the emulated costs are very high, they need to be
higher than those for the IFN costing.

i.e. the vectorizer generates:

  vect__5.9_8 = MEM <vector(4) intD.7> [(intD.7 *)vectp_ip.7_14];
  _35 = BIT_FIELD_REF <vect__5.9_8, 32, 0>;
  _36 = (sizetype) _35;
  _37 = _36 * 4;
  _38 = _34 + _37;
  _39 = (voidD.55 *) _38;
  # VUSE <.MEM_10(D)>
  _40 = MEM[(floatD.32 *)_39];

which after IVopts is:

  _63 = &MEM <vector(4) int> [(int *)ip_11(D) + ivtmp.19_27 * 1];
  _47 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 64>;
  _41 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 32>;
  _35 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 0>;
  _53 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 96>;

Which we correctly lower in RTL to individual loads to avoid the repeated umov.

As such, we should cost the vec_to_scalar as GPR loads and also do so for the
throughput which we at the moment cost as:

  note:  Vector issue estimate:
  note:    load operations = 6
  note:    store operations = 0
  note:    general operations = 6
  note:    reduction latency = 2
  note:    estimated min cycles per iteration = 2.000000

Which means 3 loads for the GOR indexes are missing, making it seem like the
emulated loop has a much lower cycles per iter than it actually does since the
bottleneck on the load units are not modelled.

But worse, because the vectorizer costs gathers/scatters IFNs as scalar
load/stores the number of loads required for an SVE gather is always much
higher than the equivalent emulated variant.

gcc/ChangeLog:

PR target/118188
* config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Adjust
throughput of emulated gather and scatters.

gcc/testsuite/ChangeLog:

PR target/118188
* gcc.target/aarch64/sve/gather_load_12.c: New test.
* gcc.target/aarch64/sve/gather_load_13.c: New test.
* gcc.target/aarch64/sve/gather_load_14.c: New test.

[PR118017][LRA]: Don't inherit reg of non-uniform reg class

In the PR case LRA inherited value of register of class INT_SSE_REGS
which resulted in LRA cycling when LRA tried to use different move
alternatives with SSE/general regs and memory. The patch rejects to
inherit such (non-uniform) classes to prevent cycling.

gcc/ChangeLog:

PR target/118017
* lra-constraints.cc (inherit_reload_reg): Check reg class on uniformity.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: New.

c++: be permissive about eh spec mismatch for op new

r15-3532 made us more strict about exception-specification mismatches with
the standard library, but let's still be permissive about operator new,
since previously you needed to say throw(std::bad_alloc).

gcc/cp/ChangeLog:

* decl.cc (check_redeclaration_exception_specification): Be more
lenient about ::operator new.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept88.C: New test.

testsuite: arm: Fix typo in gcc.target/arm/armv8_2-fp16-conv-1.c

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Fix typo.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

s390: Add testcase for just fixed PR118362

On Thu, Jan 09, 2025 at 01:29:27PM +0100, Stefan Schulze Frielinghaus wrote:
> Optimization s390_constant_via_vgbm_p() should only apply to constant
> vectors which can be expressed by the hardware, i.e., which have a size
> of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
> and s390_constant_via_vrepi_p().
>
> gcc/ChangeLog:
>
>       PR target/118362
>       * config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
>       16-byte vectors.
> ---
>  Bootstrap and regtest are still running.  If both are successful, I
>  will push this one promptly.

This was committed without a testcase, which IMHO shouldn't hurt.

2025-01-09  Jakub Jelinek  <jakub@redhat.com>

PR target/118362
* gcc.c-torture/compile/pr118362.c: New test.
* gcc.target/s390/pr118362.c: New test.

c: Restore warning for incomplete structures declared in parameter list [PR117866]

In C23 mode the warning about declaring structures and union in
parameter lists was removed, because it is possible to redeclare
a compatible type elsewhere. This is not the case for incomplete types,
so restore the warning for those types.

PR c/117866

gcc/c/ChangeLog:
* c-decl.cc (get_parm_info): Change condition for warning.

gcc/testsuite/ChangeLog:
* gcc.dg/pr117866.c: New test.
* gcc.dg/strub-pr118007.c: Adapt.

testsuite: arm: Use -Os in memset-inline-8* tests

When the test was initially created, -fcommon was the default, but in
commit r10-4867-g6271dd984d7 the default value changed to -fno-common.
This change made the test start failing. To counter the over-alignment
caused by 'a' no longer being common, use -Os.

gcc/testsuite/ChangeLog:

* gcc.target/arm/memset-inline-8.c: Use -Os and prefix assembler
instructions with a tab to improve test stability.
* gcc.target/arm/memset-inline-8-exe.c: Use -Os.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Verify asm per function for armv8_2-fp16-conv-1.c

This change will enforce that the expected instructions are generated
per function rather than allowing some other function to use the
expected instructions.

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Convert
scan-assembler-times to check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c, c++: preserve type name in conversion [PR116060]

When the program requests a conversion to a typedef, let's try harder to
remember the new name.

Torbjörn's original patch changed the type of the original expression, but
that seems not generally desirable; we might want either or both of the
original type and the converted-to type to be represented. So this
expresses the name change as a NOP_EXPR.

Compiling stdc++.h, this adds 519 allocations out of 1870k, or 0.28%.

The -Wsuggest-attribute=format change was necessary to do the check before
converting to the target type, which seems like an improvement.

PR c/116060

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Make sure left hand side and
right hand side has identical named types to aid diagnostic output.

gcc/cp/ChangeLog:

* call.cc (standard_conversion): Preserve type name in ck_identity.
(maybe_adjust_type_name): New.
(convert_like_internal): Use it.
Handle -Wsuggest-attribute=format here.
(convert_for_arg_passing): Not here.

gcc/testsuite/ChangeLog:

* c-c++-common/analyzer/out-of-bounds-diagram-8.c: Update to
correct type.
* c-c++-common/analyzer/out-of-bounds-diagram-11.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-10.c: Likewise.

Co-authored-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: Require trampolines for gcc.dg/pr118325.c

The test case uses a nested function, which is not supported by some
targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118325.c: Require effective target trampolines.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

s390: Fix s390_constant_via_vgbm_p() [PR118362]

Optimization s390_constant_via_vgbm_p() should only apply to constant
vectors which can be expressed by the hardware, i.e., which have a size
of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
and s390_constant_via_vrepi_p().

gcc/ChangeLog:

PR target/118362
* config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
16-byte vectors.

c++: ICE during requires-expr partial subst [PR118060]

Here during partial substitution of the requires-expression (as part of
CTAD constraint rewriting) we segfault from the INDIRECT_REF case of
convert_to_void due *f(u) being type-dependent. We should just defer
checking convert_to_void until satisfaction.

PR c++/118060

gcc/cp/ChangeLog:

* constraint.cc (tsubst_valid_expression_requirement): Don't
check convert_to_void during partial substitution.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires40.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: tf_partial and instantiate_template [PR117887]

Ever since r15-3530-gdfb63765e994be the extra-args mechanism now expects
to see tf_partial whenever doing a partial substitution containing
dependent arguments. The below testcases show that instantiate_template
for AT with args={T}/{T*} is neglecting to set it in that case, and we
end up ICEing from add_extra_args during the subsequent full substitution.

This patch makes instantiate_template set tf_partial accordingly.

PR c++/117887

gcc/cp/ChangeLog:

* pt.cc (instantiate_template): Set tf_partial if arguments are
dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires39.C: New test.
* g++.dg/cpp2a/lambda-targ10.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: constexpr potentiality of CAST_EXPR [PR117925]

We're incorrectly treating the templated callee (FnPtr)fnPtr, represented
as CAST_EXPR with TREE_LIST operand, as potentially constant here due to
neglecting to look through the TREE_LIST in the CAST_EXPR case of p_c_e_1.

PR c++/117925

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case CAST_EXPR>:
Fix check for class conversion to literal type to properly look
through the TREE_LIST operand of a CAST_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent35.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: relax ICE for unexpected trees during constexpr [PR117925]

When we encounter an unexpected (likely templated) tree code during
constexpr evaluation we currently ICE even in release mode. But it
seems more user-friendly to just gracefully treat the expression as
non-constant, which will be harmless most of the time (e.g. in the case
of warning-specific or speculative constexpr folding as in the PR), and
at worst will transform an ICE-on-valid bug into a rejects-valid bug.
This is also what e.g. tsubst_expr does when it encounters an unexpected
tree code.

PR c++/117925

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) <default>:
Relax ICE when encountering an unexpected tree code into a
checking ICE guarded by flag_checking.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: current inst w/ indirect dependent bases [PR117993]

In the first testcase we're overeagerly diagnosing qualified name lookup
failure for f from the current instantiation B<T>::C ahead of time
because we (correctly) deem C to not have any direct dependent bases:
its direct base B<T> is part of the current instantiation and therefore
not a dependent base, and we decide it's safe to diagnose name lookup
failure ahead of time.

But this testcase demonstrates it's not enough to consider only direct
dependent bases: f is defined in A<T> which is a dependent base of
B<T>, so qualified name lookup from C won't search it ahead of time and
in turn won't be exhaustive, and so it's wrong to diagnose lookup
failure ahead of time.  This ultimately suggests that
any_dependent_bases_p needs to consider indirect bases as well.

To that end it seems sufficient to make the predicate recurse into any
!BINFO_DEPENDENT_BASE_P base since the recursive call will exit early
for non-dependent types.  So effectively we'll only recurse into bases
belonging to the current instantiation.

I considered more narrowly making only dependentish_scope_p consider
indirect dependent bases, but it seems other any_dependent_bases_p
callers also want this behavior, e.g. build_new_method_call for benefit
of the second testcase (which is an even older regression since GCC 7).

PR c++/117993

gcc/cp/ChangeLog:

* search.cc (any_dependent_bases_p): Recurse into bases (of
dependent type) that are not BINFO_DEPENDENT_BASE_P.  Document
default argument.

gcc/testsuite/ChangeLog:

* g++.dg/template/dependent-base4.C: New test.
* g++.dg/template/dependent-base5.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: template-id dependence wrt local static arg [PR117792]

Here we end up ICEing at instantiation time for the call to
f<local_static> ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.

The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address. Other kinds of template-ids aren't
affected since they're coerced ahead of time.

So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).

PR c++/117792

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Consider the dependence
of the address of each template argument of a function
template-id.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

arm: [MVE intrinsics] Another fix for moves of tuples (PR target/118131)

Commit r15-6389-g670df03e5294a3 only partially fixed support for moves
of large modes: despite the introduction of V2x* and V4x* modes in
r15-6245-g4f4e13dd235b to support MVE tuples, we still need to support
TI, OI and XI modes, which appear for instance in gcc.dg/pr100887.c.

The problem was noticed when running the testsuite with
-mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
where several tests would ICE in output_move_neon.

gcc/ChangeLog:

PR target/118131
* config/arm/arm.h (VALID_MVE_STRUCT_MODE): Accept TI, OI and XI
modes again.

'git mv gcc/testsuite/gcc.dg/{,torture/}crc-linux-3.c'

Like recent commit 96f5fd3089075b56ea9ea85060213cc4edd7251a
"Move some CRC tests into the gcc.dg/torture directory" moved a few files, this
one also needs to go into torture testing: otherwise, it's compiled just at
'-O0', where the CRC optimization pass isn't active.

gcc/testsuite/
* gcc.dg/crc-linux-3.c: Move...
* gcc.dg/torture/crc-linux-3.c: ... here.

nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]

..., and use it for '-mno-soft-stack': PTX "native" stacks.

PR target/65181
gcc/
* config/nvptx/nvptx.cc (nvptx_get_drap_rtx): Handle
'!TARGET_SOFT_STACK'.
* config/nvptx/nvptx.md (define_c_enum "unspec"): Add
'UNSPEC_STACKSAVE', 'UNSPEC_STACKRESTORE'.
(define_expand "allocate_stack", define_expand "save_stack_block")
(define_expand "save_stack_block"): Handle '!TARGET_SOFT_STACK',
PTX 'alloca'.
(define_insn "@nvptx_alloca_<mode>")
(define_insn "@nvptx_stacksave_<mode>")
(define_insn "@nvptx_stackrestore_<mode>"): New.
* doc/invoke.texi (Nvidia PTX Options): Update '-msoft-stack',
'-mno-soft-stack'.
* doc/sourcebuild.texi (nvptx-specific attributes): Document
'nvptx_runtime_alloca_ptx'.
(Add Options): Document 'nvptx_alloca_ptx'.
gcc/testsuite/
* gcc.target/nvptx/alloca-1.c: Evolve into...
* gcc.target/nvptx/alloca-1-O0.c: ... this, ...
* gcc.target/nvptx/alloca-1-O1.c: ... this, and...
* gcc.target/nvptx/alloca-1-sm_30.c: ... this.
* gcc.target/nvptx/vla-1.c: Evolve into...
* gcc.target/nvptx/vla-1-O0.c: ... this, ...
* gcc.target/nvptx/vla-1-O1.c: ... this, and...
* gcc.target/nvptx/vla-1-sm_30.c: ... this.
* gcc.c-torture/execute/pr36321.c: Adjust.
* gcc.target/nvptx/__builtin_alloca_0-1-O0.c: Likewise.
* gcc.target/nvptx/__builtin_alloca_0-1-O1.c: Likewise.
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1.c:
Likewise.
* gcc.target/nvptx/softstack.c: Likewise.
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1-sm_30.c:
New.
* gcc.target/nvptx/alloca-2-O0.c: Likewise.
* gcc.target/nvptx/alloca-3-O1.c: Likewise.
* gcc.target/nvptx/alloca-4-O3.c: Likewise.
* gcc.target/nvptx/alloca-5.c: Likewise.
* lib/target-supports.exp (check_effective_target_alloca): Adjust.
(check_nvptx_default_ptx_isa_target_architecture_at_least)
(check_nvptx_runtime_ptx_isa_target_architecture_at_least)
(check_effective_target_nvptx_runtime_alloca_ptx)
(add_options_for_nvptx_alloca_ptx): New.
libgomp/
* fortran.c (omp_get_device_from_uid_): Adjust.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

Avoid PHI node re-allocation in loop copying

duplicate_loop_body_to_header_edge redirects the original loop entry
edge to the loop copy header and the copied loop exit to the old
loop header. But it does so in the order that requires temporary
space for an extra edge on the original loop header, causing
unnecessary re-allocations. The following avoids this by swapping
the order of the redirects.

* cfgloopmanip.cc (duplicate_loop_body_to_header_edge): When
copying to the header edge first redirect the entry to the
new loop and then the exit to the old to avoid PHI node
re-allocation.

ada: Fix missing detection of late equality operator returning subtype of Boolean

In Ada 2012, the compiler fails to check that a primitive equality operator
for an untagged record type must appear before the type is frozen, when the
operator returns a subtype of Boolean. This plugs the legality loophole but
adds the debug switch -gnatd_q to go back to the previous state.

gcc/ada/ChangeLog:

PR ada/18765
* debug.adb (d_q): Document new usage.
* sem_ch6.adb (New_Overloaded_Entity): Apply the special processing
to all equality operators whose base result type is Boolean, but do
not enforce the new Ada 2012 freezing rule if the result type is a
proper subtype of it and the -gnatd_q switch is specified.

ada: Accept predefined multiply operator for fixed point in expression function

The RM 4.5.5(19.1/2) subclause says that the predefined multiply operator
for universal_fixed is still available, despite the declaration of a user-
defined primitive multiply operator for the fixed-point type at stake, if
it is identified using an expanded name with prefix denoting Standard, but
this is currently not the case in the context of an expression function.

gcc/ada/ChangeLog:

PR ada/118274
* sem_ch4.adb (Check_Arithmetic_Pair.Has_Fixed_Op): Use the original
node of the operator to identify the case of an expanded name whose
prefix is the package Standard.