git.ipfire.org Git - thirdparty/gcc.git/log

Add new hardreg PRE pass

This pass is used to optimise assignments to the FPMR register in
aarch64.  I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.

Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later.  This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.

Any uses of the hardreg in debug insns will be deleted.  We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).

gcc/ChangeLog:

* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.

Disable a broken multiversioning optimisation

This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:

1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.

2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time.  A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
   chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
   a higher priority caller version would have been eligble for
   selection (and hence the current caller version wouldn't have been
   selected).

The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.

Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage.  However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.

Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.

gcc/ChangeLog:

* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-pragma.C: New test.

docs: Add new AArch64 flags

gcc/ChangeLog:

* doc/invoke.texi: Add new AArch64 flags.

aarch64: Add new +xs flag

GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
* config/aarch64/aarch64-option-extensions.def (XS): New flag.

aarch64: Add new +wfxt flag

GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.

aarch64: Add new +rcpc2 flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
* config/aarch64/aarch64-option-extensions.def
(RCPC2): New flag.
(RCPC3): Add RCPC2 dependency.
* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
expected feature string instead of rcpc.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +flagm2 flag

GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
expected feature string instead of flagm.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +frintts flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
* config/aarch64/arm_neon.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +jscvt flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +fcma flag

This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.

Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
(SVE): Add FCMA dependency.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_15: Add fcma to Features.
* gcc.target/aarch64/cpunative/info_16: Ditto.
* gcc.target/aarch64/cpunative/info_17: Ditto.
* gcc.target/aarch64/cpunative/info_8: Ditto.
* gcc.target/aarch64/cpunative/info_9: Ditto.

aarch64: Use PAUTH instead of V8_3A in some places

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.

c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]

Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

PR c/118376
* c-parser.cc (c_parser_postfix_expression): Call
set_c_expr_source_range before break in the __builtin_stdc_rotate_*
case.

* gcc.dg/pr118376.c: New test.

rtl: Remove invalid compare simplification [PR117186]

g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
added code to treat:

  (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))

as a nop.  This PR shows that that isn't always correct.
The compare in the set above is between two 0/1 booleans (at least
on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
produced the incoming (reg:CC cc) is unconstrained; it could be between
arbitrary integers, or even floats.  The fold is therefore replacing a
cc that is valid for both signed and unsigned comparisons with one that
is only known to be valid for signed comparisons.

  (gt (compare (gt cc 0) (lt cc 0) 0)

does simplify to:

  (gt cc 0)

but:

  (gtu (compare (gt cc 0) (lt cc 0) 0)

does not simplify to:

  (gtu cc 0)

The optimisation didn't come with a testcase, but it was added for
i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
as it once did, since it's now conditional on -minline-all-stringops.
But the patch is almost 25 years old, so whatever the original
motivation was, it seems likely that other things now rely on it.

It therefore seems better to try to preserve the optimisation on rtl
rather than get rid of it.  To do that, we need to look at how the
result of the outer compare is used.  We'd therefore be looking at four
instructions (the gt, the lt, the compare, and the use of the compare),
but combine already allows that for 3-instruction combinations thanks
to:

  /* If the source is a COMPARE, look for the use of the comparison result
     and try to simplify it unless we already have used undobuf.other_insn.  */

When applied to boolean inputs, a comparison operator is
effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
simplify_logical_relational_operation already had code to simplify
logical operators between two comparison results, but:

* It only handled IOR, which doesn't cover all the cases needed here.
  The others are easily added.

* It treated comparisons of integers as having an ORDERED/UNORDERED result.
  Therefore:

  * it would not treat "true for LT + EQ + GT" as "always true" for
    comparisons between integers, because the mask excluded the UNORDERED
    condition.

  * it would try to convert "true for LT + GT" into LTGT even for comparisons
    between integers.  To prevent an ICE later, the code used:

       /* Many comparison codes are only valid for certain mode classes.  */
       if (!comparison_code_valid_for_mode (code, mode))
         return 0;

    However, this used the wrong mode, since "mode" is here the integer
    result of the comparisons (and the mode of the IOR), not the mode of
    the things being compared.  Thus the effect was to reject all
    floating-point-only codes, even when comparing floats.

  I think instead the code should detect whether the comparison is between
  integer values and remove UNORDERED from consideration if so.  It then
  always produces a valid comparison (or an always true/false result),
  and so comparison_code_valid_for_mode is not needed.  In particular,
  "true for LT + GT" becomes NE for comparisons between integers but
  remains LTGT for comparisons between floats.

* There was a missing check for whether the comparison inputs had
  side effects.

While there, it also seemed worth extending
simplify_logical_relational_operation to unsigned comparisons, since
that makes the testing easier.

As far as that testing goes: the patch exhaustively tests all
combinations of integer comparisons in:

  (cmp1 (cmp2 X Y) (cmp3 X Y))

for the 10 integer comparisons, giving 1000 fold attempts in total.
It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
on the result of the fold, giving 9 checks per fold, or 9000 in total.
That's probably more than is typical for self-tests, but it seems to
complete in neglible time, even for -O0 builds.

gcc/
PR rtl-optimization/117186
* rtl.h (simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.
* simplify-rtx.cc (unsigned_comparison_to_mask): New function.
(mask_to_unsigned_comparison): Likewise.
(comparison_code_valid_for_mode): Delete.
(simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
comparisons.  Handle always-false results.  Ignore the low bit
of the mask if the operands are always ordered and remove the
then-redundant check of comparison_code_valid_for_mode.  Check
for side-effects in the operands before simplifying them away.
(simplify_context::simplify_binary_operation_1): Remove
simplification of (compare (gt ...) (lt ...)) and instead...
(simplify_context::simplify_relational_operation_1): ...handle
comparisons of comparisons here.
(test_comparisons): New function.
(test_scalar_ops): Call it.

gcc/testsuite/
PR rtl-optimization/117186
* gcc.dg/torture/pr117186.c: New test.
* gcc.target/aarch64/pr117186.c: Likewise.

[ifcombine] drop other misuses of uniform_integer_cst_p

As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
ifcombine makes no sense, we're not dealing with vectors. Indeed,
I've been misunderstanding and misusing it since I cut&pasted it from
some preexisting match predicate in earlier version of the ifcombine
field-merge patch.

for gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Drop misuses of
uniform_integer_cst_p.
(fold_truth_andor_for_ifcombine): Likewise.

[ifcombine] fix mask variable test to match use [PR118344]

There was a cut&pasto in the rr_and_mask's adjustment to match the
combined type: the test on whether there was a mask already was
testing the wrong variable, and then it might crash or otherwise fail
accessing an undefined mask.  This only hit with checking enabled,
and rarely at that.

for  gcc/ChangeLog

PR tree-optimization/118344
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
rr_and_mask's type adjustment test.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118344
* gcc.dg/field-merge-19.c: New.

[ifcombine] reuse left-hand mask to decode right-hand xor operand

If fold_truth_andor_for_ifcombine applies a mask to an xor, say
because the result of the xor is compared with a power of two [minus
one], we have to apply the same mask when processing both the left-
and right-hand xor paths for the transformation to be sound.  Arrange
for decode_field_reference to propagate the incoming mask along with
the expression to the right-hand operand.

Don't require the right-hand xor operand to be a constant, that was a
cut&pasto.

for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
Propagate pand_mask to the right-hand xor operand.  Don't
require the right-hand xor operand to be a constant.
(fold_truth_andor_for_ifcombine): Pass right-hand mask when
appropriate.

[ifcombine] adjust for narrowing converts before shifts [PR118206]

A narrowing conversion and a shift both drop bits from the loaded
value, but we need to take into account which one comes first to get
the right number of bits and mask.

Fold when applying masks to parts, comparing the parts, and combining
the results, in the odd chance either mask happens to be zero.

for gcc/ChangeLog

PR tree-optimization/118206
* gimple-fold.cc (decode_field_reference): Account for upper
bits dropped by narrowing conversions whether before or after
a right shift.
(fold_truth_andor_for_ifcombine): Fold masks, compares, and
combined results.

for gcc/testsuite/ChangeLog

PR tree-optimization/118206
* gcc.dg/field-merge-18.c: New.

testsuite: generalized field-merge tests for <32-bit int [PR118025]

Explicitly convert constants to the desired types, so as to not elicit
warnings about implicit truncations, nor execution errors, on targets
whose ints are narrower than 32 bits.

for gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Convert constants to desired types.
* gcc.dg/field-merge-3.c: Likewise.
* gcc.dg/field-merge-4.c: Likewise.
* gcc.dg/field-merge-5.c: Likewise.
* gcc.dg/field-merge-11.c: Likewise.
* gcc.dg/field-merge-17.c: Don't mess with padding bits.

testsuite: generalize ifcombine field-merge tests [PR118025]

A number of tests that check for specific ifcombine transformations
fail on AVR and PRU targets, whose type sizes and alignments aren't
conducive of the expected transformations.  Adjust the expectations.

Most execution tests should run successfully regardless of the
transformations, but a few that could conceivably fail if short and
char have the same bit width now check for that and bypass the tests
that would fail.

Conversely, one test that had such a runtime test, but that would work
regardless, no longer has that runtime test, and its types are
narrowed so that the transformations on 32-bit targets are more likely
to be the same as those that used to take place on 64-bit targets.
This latter change is somewhat obviated by a separate patch, but I've
left it in place anyway.

for  gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Skip BIT_FIELD_REF counting on AVR and PRU.
* gcc.dg/field-merge-3.c: Bypass the test if short doesn't have the
expected size.
* gcc.dg/field-merge-8.c: Likewise.
* gcc.dg/field-merge-9.c: Likewise.  Skip optimization counting on
AVR and PRU.
* gcc.dg/field-merge-13.c: Skip optimization counting on AVR and PRU.
* gcc.dg/field-merge-15.c: Likewise.
* gcc.dg/field-merge-17.c: Likewise.
* gcc.dg/field-merge-16.c: Likewise.  Drop runtime bypass.  Use
smaller types.
* gcc.dg/field-merge-14.c: Add comments.

ifcombine field-merge: improve handling of dwords

On 32-bit hosts, data types with 64-bit alignment aren't getting
treated as desired by ifcombine field-merging: we limit the choice of
modes at BITS_PER_WORD sizes, but when deciding the boundary for a
split, we'd limit the choice only by the alignment, so we wouldn't
even consider a split at an odd 32-bit boundary.  Fix that by limiting
the boundary choice by word choice as well.

Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
data structures unhandled by ifcombine on 32-bit hosts.  We already
need to loading them as double words, and if they're not byte-aligned,
the code gets really ugly, but ifcombine could improve it if it allows
double-word loads as a last resort.  I've added that.

for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
boundary choice by word size as well.  Try aligned double-word
loads as a last resort.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-17.c: New.

ipa-cp: Fold-convert values when necessary (PR 118138)

PR 118138 and quite a few duplicates that it has acquired in a short
time show that even though we are careful to make sure we do not loose
any bits when newly allowing type conversions in jump-functions, we
still need to perform the fold conversions during IPA constant
propagation and not just at the end in order to properly perform
sign-extensions or zero-extensions as appropriate.

This patch does just that, changing a safety predicate we already use
at the appropriate places to return the necessary type.

gcc/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* ipa-cp.cc (ipacp_value_safe_for_type): Return the appropriate
type instead of a bool, accept NULL_TREE VALUEs.
(propagate_vals_across_arith_jfunc): Use the new returned value of
ipacp_value_safe_for_type.
(propagate_vals_across_ancestor): Likewise.
(propagate_scalar_across_jump_function): Likewise.

gcc/testsuite/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* gcc.dg/ipa/pr118138.c: New test.

nvptx: Add '__builtin_frame_address(0)' test case

Documenting the status quo.

gcc/testsuite/
* gcc.target/nvptx/__builtin_frame_address_0-1.c: New.

nvptx: Add '__builtin_stack_address()' test case

Documenting the status quo.

gcc/testsuite/
* gcc.target/nvptx/__builtin_stack_address-1.c: New.

testsuite: arm: Use -std=c17 and effective-target arm_arch_v5te_thumb

With -std=c23, the following errors are now emitted as the function
prototype and implementation does not match:

.../pr59858.c: In function 're_search_internal':
.../pr59858.c:95:17: error: too many arguments to function 'check_matching'
.../pr59858.c:75:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:100:1: error: conflicting types for 'check_matching'; have 'int(re_match_context_t *, int *)'
.../pr59858.c:75:12: note: previous declaration of 'check_matching' with type 'int(void)'
.../pr59858.c: In function 'check_matching':
.../pr59858.c:106:14: error: too many arguments to function 'transit_state'
.../pr59858.c:77:23: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:111:1: error: conflicting types for 'transit_state'; have 're_dfastate_t *(re_match_context_t *, re_dfastate_t *)'
.../pr59858.c:77:23: note: previous declaration of 'transit_state' with type 're_dfastate_t *(void)'
.../pr59858.c: In function 'transit_state':
.../pr59858.c:116:7: error: too many arguments to function 'build_trtable'
.../pr59858.c:79:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:121:1: error: conflicting types for 'build_trtable'; have 'int(const re_dfa_t *, re_dfastate_t *)'
.../pr59858.c:79:12: note: previous declaration of 'build_trtable' with type 'int(void)'

Adding -std=c17 removes these errors.

Also, updated test case to use -mcpu=unset/-march=unset feature
introduced in r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr59858.c: Use -std=c17 and effective-target
arm_arch_v5te_thumb.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

ada: Incorrect accessibilty level for library level subprograms

The patch fixes an issue in the compiler whereby accessibility level
calculations for objects declared witihin library-level subprograms
were done incorrectly - potentially allowing runtime accessibility
checks to spuriously pass.

gcc/ada/ChangeLog:

* accessibility.adb:
(Innermost_master_Scope_Depth): Add special case for expressions
within library level subprograms.

ada: Remove empty line.

gcc/ada/ChangeLog:

* env.h: Remove last empty line.

ada: Set syntactic node properties immediately when crating the nodes

When creating a node, we can directly set its syntactic properties.
Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* contracts.adb (Build_Call_Helper_Decl): Tune whitespace.
* exp_attr.adb (Analyze_Attribute): Set Of_Present while
creating the node; reorder setting Subtype_Indication to match the
syntax order.
* exp_ch3.adb (Build_Equivalent_Aggregate): Likewise for Box_Present
and Expression properties.
* sem_ch12.adb (Analyze_Formal_Derived_Type): Set type properties
when creating the nodes.
* sem_ch3.adb (Check_Anonymous_Access_Component): Likewise.

ada: Turn Is_Effective_Use_Clause from syntactic to semantic flag

For a USE clause being effective is a semantic property, not a syntactic.
AST cleanup; behavior is unaffected.

gcc/ada/ChangeLog:

* gen_il-gen-gen_nodes.adb (Gen_Nodes): Change Is_Effective_Use_Clause
from syntactic to semantic property.

ada: Reorder syntactic node fields to match the Ada RM grammar

Several AST nodes had their syntactic fields in a different order than
specified by the Ada RM grammar. With the variable-size nodes this no longer
had an impact on the AST memory layout and was making the automatically
generated Nmake routines a bit unintuitive to use.

gcc/ada/ChangeLog:

* exp_ch3.adb (Predef_Spec_Or_Body): Add explicit parameter
associations, because now the Empty_List actual parameter would be
confused as being for the Aspect_Specifications formal parameter.
* gen_il-gen-gen_nodes.adb (Gen_Nodes): Reorder syntactic fields.
* sem_util.adb (Declare_Indirect_Temp): Add explicit parameter
association, because now the parameter will be interpreted as a
subpool handle name.

c++: Fix up ICEs on constexpr inline asm strings in templates [PR118277]

The following patch fixes ICEs when the new inline asm syntax
to use C++26 static_assert-like constant expressions in place
of string literals is used in templates.
As finish_asm_stmt doesn't do any checking for
processing_template_decl, this patch also just defers handling
those strings in templates rather than say trying fold_non_dependent_expr
and if the result is non-dependent and usable, try to extract.

The patch also reverts changes to cp_parser_asm_specification_opt
which allowed something like
void foo () asm ((std::string_view ("bar")));
but it would be really hard to support
template <int N>
void baz () asm ((std::string_view ("qux")));
(especially with dependent constant expression).

And the patch adds extensive test coverage for the various errors.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

PR c++/118277
* cp-tree.h (finish_asm_string_expression): Declare.
* semantics.cc (finish_asm_string_expression): New function.
(finish_asm_stmt): Use it.
* parser.cc (cp_parser_asm_string_expression): Likewise.
Wrap string into PAREN_EXPR in the ("") case.
(cp_parser_asm_definition): Don't ICE if finish_asm_stmt
returns error_mark_node.
(cp_parser_asm_specification_opt): Revert 2024-06-24 changes.
* pt.cc (tsubst_stmt): Don't ICE if finish_asm_stmt returns
error_mark_node.

* g++.dg/cpp1z/constexpr-asm-4.C: New test.
* g++.dg/cpp1z/constexpr-asm-5.C: New test.

c++: Fix up modules handling of namespace scope structured bindings

With the following patch I actually get a simple namespace scope structured
binding working with modules.

The core_vals change ensure we actually save/restore DECL_VALUE_EXPR even
for namespace scope vars, the get_merge_kind is based on the assumption
that structured bindings are always unique, one can't redeclare them and
without it we really ICE because their base vars have no name.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

* module.cc (trees_out::core_vals): Note DECL_VALUE_EXPR even for
vars outside of functions.
(trees_in::core_vals): Read in DECL_VALUE_EXPR even for vars outside
of functions.
(trees_out::get_merge_kind): Make DECL_DECOMPOSITION_P MK_unique.

* g++.dg/modules/decomp-2_b.C: New test.
* g++.dg/modules/decomp-2_a.H: New file.

fortran: use_iso_fortran_env_module tweaks [PR118337]

This patch adds a comment to explain why we initialize the non-constant
elts of symbol array separately and checking assert to verify that separate
initialization bumps the iterator for each macro.

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

PR fortran/118337
* module.cc (use_iso_fortran_env_module): Add a comment explaining
the optimization performed.  Add gcc_checking_assert that i was
incremented for all the elements.  Formatting fix.

c++: improve some modules comments

gcc/cp/ChangeLog:

* error.cc (cxx_initialize_diagnostics): Improve comment.
* module.cc (modules): Improve comment.
(get_originating_module): Add function comment.

c++: modules, generic lambda, constexpr if

In std/ranges/concat/1.cc we end up instantiating
concat_view::iterator::operator-, which has nested generic lambdas, where
the innermost is all constexpr if. tsubst_lambda_expr propagates
the returns_* flags for generic lambdas since we might not substitute into
the whole function, as in this case with constexpr if. But the module
wasn't preserving that flag, and so the importer gave a bogus "no return
statement" diagnostic.

gcc/cp/ChangeLog:

* module.cc (trees_out::write_function_def): Write returns* flags.
(struct post_process_data): Add returns_* flags.
(trees_in::read_function_def): Set them.
(module_state::read_cluster): Use them.

gcc/testsuite/ChangeLog:

* g++.dg/modules/constexpr-if-1_a.C: New test.
* g++.dg/modules/constexpr-if-1_b.C: New test.

LoongArch: Opitmize the cost of vec_construct.

When analyzing 525 on LoongArch architecture, it was found that the
for loop of hotspot function x264_pixel_satd_8x4 could not be quantized
256-bit due to the cost of vec_construct setting. After re-adjusting
vec_construct, the performance of 525 program was improved by 16.57%.
It was found that this function can be vectorized on the aarch64 and
x86 architectures, see [PR98138].

Co-Authored-By: Deng Jianbo <dengjianbo@loongson.cn>.
gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Modify the
construction cost of the vec_construct vector.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-slp-two-operator.c: New test.

Daily bump.

RISC-V: testsuite: fix target selector for sync_char_short

The effective-target selector for riscv on sync_char_short did not
check to see if atomics were enabled. As a result, these test cases were
ran on targets without the a extension. Add additional checks for zalrsc
or zabha extensions.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix effective target sync_char_short
for riscv*-*-*

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

AArch64: Fix costing of emulated gathers/scatters [PR118188]

When a target does not support gathers and scatters the vectorizer tries to
emulate these using scalar loads/stores and a reconstruction of vectors from
scalar.

The loads are still marked with VMAT_GATHER_SCATTER to indicate that they are
gather/scatters, however the vectorizer also asks the target to cost the
instruction that generates the indexes for the emulated instructions.

This is done by asking the target to cost vec_to_scalar and vec_construct with
a stmt_vinfo being the VMAT_GATHER_SCATTER.

Since Adv. SIMD does not have an LD1 variant that takes an Adv. SIMD Scalar
element the operation is lowered entirely into a sequence of GPR loads to create
the x registers for the indexes.

At the moment however we don't cost these, and so the vectorizer things that
when it emulates the instructions that it's much cheaper than using an actual
gather/scatter with SVE.  Consider:

#define iterations 100000
#define LEN_1D 32000

float a[LEN_1D], b[LEN_1D];

float
s4115 (int *ip)
{
    float sum = 0.;
    for (int i = 0; i < LEN_1D; i++)
        {
            sum += a[i] * b[ip[i]];
        }
    return sum;
}

which before this patch with -mcpu=<sve-core> generates:

.L2:
        add     x3, x0, x1
        ldrsw   x4, [x0, x1]
        ldrsw   x6, [x3, 4]
        ldpsw   x3, x5, [x3, 8]
        ldr     s1, [x2, x4, lsl 2]
        ldr     s30, [x2, x6, lsl 2]
        ldr     s31, [x2, x5, lsl 2]
        ldr     s29, [x2, x3, lsl 2]
        uzp1    v30.2s, v30.2s, v31.2s
        ldr     q31, [x7, x1]
        add     x1, x1, 16
        uzp1    v1.2s, v1.2s, v29.2s
        zip1    v30.4s, v1.4s, v30.4s
        fmla    v0.4s, v31.4s, v30.4s
        cmp     x1, x8
        bne     .L2

but during costing:

a[i_18] 1 times vector_load costs 4 in body
*_4 1 times unaligned_load (misalign -1) costs 4 in body
b[_5] 4 times vec_to_scalar costs 32 in body
b[_5] 4 times scalar_load costs 16 in body
b[_5] 1 times vec_construct costs 3 in body
_1 * _6 1 times vector_stmt costs 2 in body
_7 + sum_16 1 times scalar_to_vec costs 4 in prologue
_7 + sum_16 1 times vector_stmt costs 2 in epilogue
_7 + sum_16 1 times vec_to_scalar costs 4 in epilogue
_7 + sum_16 1 times vector_stmt costs 2 in body

Here we see that the latency for the vec_to_scalar is very high.  We know the
intermediate vector isn't usable by the target ISA and will always be elided.
However these latencies need to remain high because when costing gather/scatters
IFNs we still pass the nunits of the type along.  In other words, the vectorizer
is still costing vector gather/scatters as scalar load/stores.

Lowering the cost for the emulated gathers would result in emulation being
seemingly cheaper.  So while the emulated costs are very high, they need to be
higher than those for the IFN costing.

i.e. the vectorizer generates:

  vect__5.9_8 = MEM <vector(4) intD.7> [(intD.7 *)vectp_ip.7_14];
  _35 = BIT_FIELD_REF <vect__5.9_8, 32, 0>;
  _36 = (sizetype) _35;
  _37 = _36 * 4;
  _38 = _34 + _37;
  _39 = (voidD.55 *) _38;
  # VUSE <.MEM_10(D)>
  _40 = MEM[(floatD.32 *)_39];

which after IVopts is:

  _63 = &MEM <vector(4) int> [(int *)ip_11(D) + ivtmp.19_27 * 1];
  _47 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 64>;
  _41 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 32>;
  _35 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 0>;
  _53 = BIT_FIELD_REF <MEM <vector(4) int> [(int *)_63], 32, 96>;

Which we correctly lower in RTL to individual loads to avoid the repeated umov.

As such, we should cost the vec_to_scalar as GPR loads and also do so for the
throughput which we at the moment cost as:

  note:  Vector issue estimate:
  note:    load operations = 6
  note:    store operations = 0
  note:    general operations = 6
  note:    reduction latency = 2
  note:    estimated min cycles per iteration = 2.000000

Which means 3 loads for the GOR indexes are missing, making it seem like the
emulated loop has a much lower cycles per iter than it actually does since the
bottleneck on the load units are not modelled.

But worse, because the vectorizer costs gathers/scatters IFNs as scalar
load/stores the number of loads required for an SVE gather is always much
higher than the equivalent emulated variant.

gcc/ChangeLog:

PR target/118188
* config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Adjust
throughput of emulated gather and scatters.

gcc/testsuite/ChangeLog:

PR target/118188
* gcc.target/aarch64/sve/gather_load_12.c: New test.
* gcc.target/aarch64/sve/gather_load_13.c: New test.
* gcc.target/aarch64/sve/gather_load_14.c: New test.

[PR118017][LRA]: Don't inherit reg of non-uniform reg class

In the PR case LRA inherited value of register of class INT_SSE_REGS
which resulted in LRA cycling when LRA tried to use different move
alternatives with SSE/general regs and memory. The patch rejects to
inherit such (non-uniform) classes to prevent cycling.

gcc/ChangeLog:

PR target/118017
* lra-constraints.cc (inherit_reload_reg): Check reg class on uniformity.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: New.

c++: be permissive about eh spec mismatch for op new

r15-3532 made us more strict about exception-specification mismatches with
the standard library, but let's still be permissive about operator new,
since previously you needed to say throw(std::bad_alloc).

gcc/cp/ChangeLog:

* decl.cc (check_redeclaration_exception_specification): Be more
lenient about ::operator new.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept88.C: New test.

testsuite: arm: Fix typo in gcc.target/arm/armv8_2-fp16-conv-1.c

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Fix typo.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

s390: Add testcase for just fixed PR118362

On Thu, Jan 09, 2025 at 01:29:27PM +0100, Stefan Schulze Frielinghaus wrote:
> Optimization s390_constant_via_vgbm_p() should only apply to constant
> vectors which can be expressed by the hardware, i.e., which have a size
> of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
> and s390_constant_via_vrepi_p().
>
> gcc/ChangeLog:
>
>       PR target/118362
>       * config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
>       16-byte vectors.
> ---
>  Bootstrap and regtest are still running.  If both are successful, I
>  will push this one promptly.

This was committed without a testcase, which IMHO shouldn't hurt.

2025-01-09  Jakub Jelinek  <jakub@redhat.com>

PR target/118362
* gcc.c-torture/compile/pr118362.c: New test.
* gcc.target/s390/pr118362.c: New test.

c: Restore warning for incomplete structures declared in parameter list [PR117866]

In C23 mode the warning about declaring structures and union in
parameter lists was removed, because it is possible to redeclare
a compatible type elsewhere. This is not the case for incomplete types,
so restore the warning for those types.

PR c/117866

gcc/c/ChangeLog:
* c-decl.cc (get_parm_info): Change condition for warning.

gcc/testsuite/ChangeLog:
* gcc.dg/pr117866.c: New test.
* gcc.dg/strub-pr118007.c: Adapt.

testsuite: arm: Use -Os in memset-inline-8* tests

When the test was initially created, -fcommon was the default, but in
commit r10-4867-g6271dd984d7 the default value changed to -fno-common.
This change made the test start failing. To counter the over-alignment
caused by 'a' no longer being common, use -Os.

gcc/testsuite/ChangeLog:

* gcc.target/arm/memset-inline-8.c: Use -Os and prefix assembler
instructions with a tab to improve test stability.
* gcc.target/arm/memset-inline-8-exe.c: Use -Os.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Verify asm per function for armv8_2-fp16-conv-1.c

This change will enforce that the expected instructions are generated
per function rather than allowing some other function to use the
expected instructions.

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-conv-1.c: Convert
scan-assembler-times to check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c, c++: preserve type name in conversion [PR116060]

When the program requests a conversion to a typedef, let's try harder to
remember the new name.

Torbjörn's original patch changed the type of the original expression, but
that seems not generally desirable; we might want either or both of the
original type and the converted-to type to be represented. So this
expresses the name change as a NOP_EXPR.

Compiling stdc++.h, this adds 519 allocations out of 1870k, or 0.28%.

The -Wsuggest-attribute=format change was necessary to do the check before
converting to the target type, which seems like an improvement.

PR c/116060

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Make sure left hand side and
right hand side has identical named types to aid diagnostic output.

gcc/cp/ChangeLog:

* call.cc (standard_conversion): Preserve type name in ck_identity.
(maybe_adjust_type_name): New.
(convert_like_internal): Use it.
Handle -Wsuggest-attribute=format here.
(convert_for_arg_passing): Not here.

gcc/testsuite/ChangeLog:

* c-c++-common/analyzer/out-of-bounds-diagram-8.c: Update to
correct type.
* c-c++-common/analyzer/out-of-bounds-diagram-11.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-10.c: Likewise.

Co-authored-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: Require trampolines for gcc.dg/pr118325.c

The test case uses a nested function, which is not supported by some
targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118325.c: Require effective target trampolines.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

s390: Fix s390_constant_via_vgbm_p() [PR118362]

Optimization s390_constant_via_vgbm_p() should only apply to constant
vectors which can be expressed by the hardware, i.e., which have a size
of at most 16-bytes, similar as it is done for s390_constant_via_vgm_p()
and s390_constant_via_vrepi_p().

gcc/ChangeLog:

PR target/118362
* config/s390/s390.cc (s390_constant_via_vgbm_p): Allow at most
16-byte vectors.

c++: ICE during requires-expr partial subst [PR118060]

Here during partial substitution of the requires-expression (as part of
CTAD constraint rewriting) we segfault from the INDIRECT_REF case of
convert_to_void due *f(u) being type-dependent. We should just defer
checking convert_to_void until satisfaction.

PR c++/118060

gcc/cp/ChangeLog:

* constraint.cc (tsubst_valid_expression_requirement): Don't
check convert_to_void during partial substitution.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires40.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: tf_partial and instantiate_template [PR117887]

Ever since r15-3530-gdfb63765e994be the extra-args mechanism now expects
to see tf_partial whenever doing a partial substitution containing
dependent arguments. The below testcases show that instantiate_template
for AT with args={T}/{T*} is neglecting to set it in that case, and we
end up ICEing from add_extra_args during the subsequent full substitution.

This patch makes instantiate_template set tf_partial accordingly.

PR c++/117887

gcc/cp/ChangeLog:

* pt.cc (instantiate_template): Set tf_partial if arguments are
dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires39.C: New test.
* g++.dg/cpp2a/lambda-targ10.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: constexpr potentiality of CAST_EXPR [PR117925]

We're incorrectly treating the templated callee (FnPtr)fnPtr, represented
as CAST_EXPR with TREE_LIST operand, as potentially constant here due to
neglecting to look through the TREE_LIST in the CAST_EXPR case of p_c_e_1.

PR c++/117925

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case CAST_EXPR>:
Fix check for class conversion to literal type to properly look
through the TREE_LIST operand of a CAST_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent35.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: relax ICE for unexpected trees during constexpr [PR117925]

When we encounter an unexpected (likely templated) tree code during
constexpr evaluation we currently ICE even in release mode. But it
seems more user-friendly to just gracefully treat the expression as
non-constant, which will be harmless most of the time (e.g. in the case
of warning-specific or speculative constexpr folding as in the PR), and
at worst will transform an ICE-on-valid bug into a rejects-valid bug.
This is also what e.g. tsubst_expr does when it encounters an unexpected
tree code.

PR c++/117925

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) <default>:
Relax ICE when encountering an unexpected tree code into a
checking ICE guarded by flag_checking.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: current inst w/ indirect dependent bases [PR117993]

In the first testcase we're overeagerly diagnosing qualified name lookup
failure for f from the current instantiation B<T>::C ahead of time
because we (correctly) deem C to not have any direct dependent bases:
its direct base B<T> is part of the current instantiation and therefore
not a dependent base, and we decide it's safe to diagnose name lookup
failure ahead of time.

But this testcase demonstrates it's not enough to consider only direct
dependent bases: f is defined in A<T> which is a dependent base of
B<T>, so qualified name lookup from C won't search it ahead of time and
in turn won't be exhaustive, and so it's wrong to diagnose lookup
failure ahead of time.  This ultimately suggests that
any_dependent_bases_p needs to consider indirect bases as well.

To that end it seems sufficient to make the predicate recurse into any
!BINFO_DEPENDENT_BASE_P base since the recursive call will exit early
for non-dependent types.  So effectively we'll only recurse into bases
belonging to the current instantiation.

I considered more narrowly making only dependentish_scope_p consider
indirect dependent bases, but it seems other any_dependent_bases_p
callers also want this behavior, e.g. build_new_method_call for benefit
of the second testcase (which is an even older regression since GCC 7).

PR c++/117993

gcc/cp/ChangeLog:

* search.cc (any_dependent_bases_p): Recurse into bases (of
dependent type) that are not BINFO_DEPENDENT_BASE_P.  Document
default argument.

gcc/testsuite/ChangeLog:

* g++.dg/template/dependent-base4.C: New test.
* g++.dg/template/dependent-base5.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: template-id dependence wrt local static arg [PR117792]

Here we end up ICEing at instantiation time for the call to
f<local_static> ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.

The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address. Other kinds of template-ids aren't
affected since they're coerced ahead of time.

So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).

PR c++/117792

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Consider the dependence
of the address of each template argument of a function
template-id.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

arm: [MVE intrinsics] Another fix for moves of tuples (PR target/118131)

Commit r15-6389-g670df03e5294a3 only partially fixed support for moves
of large modes: despite the introduction of V2x* and V4x* modes in
r15-6245-g4f4e13dd235b to support MVE tuples, we still need to support
TI, OI and XI modes, which appear for instance in gcc.dg/pr100887.c.

The problem was noticed when running the testsuite with
-mthumb/-march=armv8.1-m.main+mve.fp+fp.dp/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
where several tests would ICE in output_move_neon.

gcc/ChangeLog:

PR target/118131
* config/arm/arm.h (VALID_MVE_STRUCT_MODE): Accept TI, OI and XI
modes again.

'git mv gcc/testsuite/gcc.dg/{,torture/}crc-linux-3.c'

Like recent commit 96f5fd3089075b56ea9ea85060213cc4edd7251a
"Move some CRC tests into the gcc.dg/torture directory" moved a few files, this
one also needs to go into torture testing: otherwise, it's compiled just at
'-O0', where the CRC optimization pass isn't active.

gcc/testsuite/
* gcc.dg/crc-linux-3.c: Move...
* gcc.dg/torture/crc-linux-3.c: ... here.

nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]

..., and use it for '-mno-soft-stack': PTX "native" stacks.

PR target/65181
gcc/
* config/nvptx/nvptx.cc (nvptx_get_drap_rtx): Handle
'!TARGET_SOFT_STACK'.
* config/nvptx/nvptx.md (define_c_enum "unspec"): Add
'UNSPEC_STACKSAVE', 'UNSPEC_STACKRESTORE'.
(define_expand "allocate_stack", define_expand "save_stack_block")
(define_expand "save_stack_block"): Handle '!TARGET_SOFT_STACK',
PTX 'alloca'.
(define_insn "@nvptx_alloca_<mode>")
(define_insn "@nvptx_stacksave_<mode>")
(define_insn "@nvptx_stackrestore_<mode>"): New.
* doc/invoke.texi (Nvidia PTX Options): Update '-msoft-stack',
'-mno-soft-stack'.
* doc/sourcebuild.texi (nvptx-specific attributes): Document
'nvptx_runtime_alloca_ptx'.
(Add Options): Document 'nvptx_alloca_ptx'.
gcc/testsuite/
* gcc.target/nvptx/alloca-1.c: Evolve into...
* gcc.target/nvptx/alloca-1-O0.c: ... this, ...
* gcc.target/nvptx/alloca-1-O1.c: ... this, and...
* gcc.target/nvptx/alloca-1-sm_30.c: ... this.
* gcc.target/nvptx/vla-1.c: Evolve into...
* gcc.target/nvptx/vla-1-O0.c: ... this, ...
* gcc.target/nvptx/vla-1-O1.c: ... this, and...
* gcc.target/nvptx/vla-1-sm_30.c: ... this.
* gcc.c-torture/execute/pr36321.c: Adjust.
* gcc.target/nvptx/__builtin_alloca_0-1-O0.c: Likewise.
* gcc.target/nvptx/__builtin_alloca_0-1-O1.c: Likewise.
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1.c:
Likewise.
* gcc.target/nvptx/softstack.c: Likewise.
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1-sm_30.c:
New.
* gcc.target/nvptx/alloca-2-O0.c: Likewise.
* gcc.target/nvptx/alloca-3-O1.c: Likewise.
* gcc.target/nvptx/alloca-4-O3.c: Likewise.
* gcc.target/nvptx/alloca-5.c: Likewise.
* lib/target-supports.exp (check_effective_target_alloca): Adjust.
(check_nvptx_default_ptx_isa_target_architecture_at_least)
(check_nvptx_runtime_ptx_isa_target_architecture_at_least)
(check_effective_target_nvptx_runtime_alloca_ptx)
(add_options_for_nvptx_alloca_ptx): New.
libgomp/
* fortran.c (omp_get_device_from_uid_): Adjust.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

Avoid PHI node re-allocation in loop copying

duplicate_loop_body_to_header_edge redirects the original loop entry
edge to the loop copy header and the copied loop exit to the old
loop header. But it does so in the order that requires temporary
space for an extra edge on the original loop header, causing
unnecessary re-allocations. The following avoids this by swapping
the order of the redirects.

* cfgloopmanip.cc (duplicate_loop_body_to_header_edge): When
copying to the header edge first redirect the entry to the
new loop and then the exit to the old to avoid PHI node
re-allocation.

ada: Fix missing detection of late equality operator returning subtype of Boolean

In Ada 2012, the compiler fails to check that a primitive equality operator
for an untagged record type must appear before the type is frozen, when the
operator returns a subtype of Boolean. This plugs the legality loophole but
adds the debug switch -gnatd_q to go back to the previous state.

gcc/ada/ChangeLog:

PR ada/18765
* debug.adb (d_q): Document new usage.
* sem_ch6.adb (New_Overloaded_Entity): Apply the special processing
to all equality operators whose base result type is Boolean, but do
not enforce the new Ada 2012 freezing rule if the result type is a
proper subtype of it and the -gnatd_q switch is specified.

ada: Accept predefined multiply operator for fixed point in expression function

The RM 4.5.5(19.1/2) subclause says that the predefined multiply operator
for universal_fixed is still available, despite the declaration of a user-
defined primitive multiply operator for the fixed-point type at stake, if
it is identified using an expanded name with prefix denoting Standard, but
this is currently not the case in the context of an expression function.

gcc/ada/ChangeLog:

PR ada/118274
* sem_ch4.adb (Check_Arithmetic_Pair.Has_Fixed_Op): Use the original
node of the operator to identify the case of an expanded name whose
prefix is the package Standard.

Fortran: Cylce detection for non vtypes only. [PR118337]

gcc/fortran/ChangeLog:

PR fortran/118337

* resolve.cc (resolve_fl_derived0): Exempt vtypes from cycle
detection.

ree: Skip extension on fixed register

Skip extension on fixed register since we can't turn

(insn 27 26 139 2 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int 16 [0x10])))
            (clobber (reg:CC 17 flags))
        ]) "x.ii":14:17 discrim 1 283 {*addsi_1}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))
...
(insn 43 125 74 2 (set (reg/f:DI 6 bp [145])
        (zero_extend:DI (reg/f:SI 7 sp))) "x.ii":15:9 175 {*zero_extendsidi2}
     (nil))

into

(insn 27 26 155 2 (parallel [
            (set (reg:DI 6 bp)
                (zero_extend:DI (plus:SI (reg/f:SI 7 sp)
                        (const_int 16 [0x10]))))
            (clobber (reg:CC 17 flags))
        ]) "x.ii":14:17 discrim 1 296 {addsi_1_zext}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))
(insn 155 27 139 2 (set (reg:DI 7 sp)
        (reg:DI 6 bp)) "x.ii":14:17 discrim 1 -1
     (nil))

without updating stack frame info.

gcc/

PR rtl-optimization/118266
* ree.cc (add_removable_extension): Skip extension on fixed
register.

gcc/testsuite/

PR rtl-optimization/118266
* gcc.target/i386/pr118266.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

ada: Error on Disable_Controlled aspect in Multiway_Trees

This patch fixes an issue in the compiler whereby instantiating Multiway_Trees
with a formal type leads to a compile-time error due to the expression supplied
for aspect Disable_Controlled specified on types decalred within
Multiway_Trees' body not being static.

gcc/ada/ChangeLog:

* libgnat/a-comutr.adb, libgnat/a-comutr.ads:
Move the declarations of iterator types into the specification and
add additional comments.

ada: Cleanup preanalysis of static expressions (part 3)

Avoid reporting spurious errors.

gcc/ada/ChangeLog:

* freeze.adb (Freeze_Expr_Types): Reverse patch; that is, restore
calls to Preanalyze_Spec_Expression instead of Preanalyze_And_Resolve
for the sake of consistency with Analyze_Expression_Function. Patch
suggested by Eric Botcazou.
* exp_put_image.adb (Image_Should_Call_Put_Image): Ensure that
function Defining_Identifier is called with a proper node to
avoid internal assertion failure.

match.pd: Avoid introducing UB in the a r<< (32-b) -> a r>> b optimization [PR117927]

As mentioned in the PR, the a r<< (bitsize-b) to a r>> b and similar
match.pd optimization which has been introduced in GCC 15 can introduce
UB which wasn't there before, in particular if b is equal at runtime
to bitsize, then a r<< 0 is turned into a r>> bitsize.

The following patch fixes it by optimizing it early only if VRP
tells us the count isn't equal to the bitsize, and late into
a r>> (b & (bitsize - 1)) if bitsize is power of two and the subtraction
has single use, on various targets the masking then goes away because
its rotate instructions do masking already.  The latter can't be
done too early though, because then the expr_not_equal_to case is
basically useless and we introduce the masking always and can't find out
anymore that there was originally no masking.  Even cfun->after_inlining
check would be too early, there is forwprop before vrp, so the patch
introduces a new PROP for the start of the last forwprop pass.

2025-01-09  Jakub Jelinek  <jakub@redhat.com>
    Andrew Pinski  <quic_apinski@quicinc.com>

PR tree-optimization/117927
* tree-pass.h (PROP_last_full_fold): Define.
* passes.def: Add last= parameters to pass_forwprop.
* tree-ssa-forwprop.cc (pass_forwprop): Add last_p non-static
data member and initialize it in the ctor.
(pass_forwprop::set_pass_param): New method.
(pass_forwprop::execute): Set PROP_last_full_fold in curr_properties
at the start if last_p.
* match.pd (a rrotate (32-b) -> a lrotate b): Only optimize either
if @2 is known not to be equal to prec or if during/after last
forwprop the subtraction has single use and prec is power of two; in
that case transform it into orotate by masked count.

* gcc.dg/tree-ssa/pr117927.c: New test.

fortran: Accept "15" modules for compatibility [PR118337]

Based on the comments in the PR, I've tried to write a patch which would
try to keep backwards compatibility with the GCC 11-14 *.mod files.
This means reordering the *.def files, so that the entries present already
in GCC 11-14 come before the ones new in GCC 15, and tweaking modules.cc
such that it can accept that ordering and while it has a newer MOD_VERSION,
it accepts even the previous one when loading modules.

2025-01-09 Jakub Jelinek <jakub@redhat.com>

PR fortran/118337
* module.cc (COMPAT_MOD_VERSIONS): Define.
(use_iso_fortran_env_module): Don't assume all NAMED_INTCSTs come
first followed by NAMED_UINTCSTs.
(gfc_use_module): Accept also COMPAT_MOD_VERSIONS for compatibility.
* iso-c-binding.def: Reorder entries so that the GCC 14 ones come
before the ones new in GCC 15.
* iso-fortran-env.def: Likewise.

i386: Remove not used model number for Diamond Rapids

In ISE, The model number for Diamond Rapids is 13_01H.
Remove 0x00 since it is unused.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Remove 0x00.

RISC-V: Refine registered_functions list for rvv overloaded intrinsics.

Before this patch, each rvv overloaded intrinsic was registered twice,
both in gcc and g++.

Take vint8mf8_t __riscv_vle8(vbool64_t vm, const int8_t *rs1, size_t vl)
as an example.

For gcc, one decl is void __riscv_vle8(void), and the other is
integer_zero_node, which is redundant.
For g++, one decl is integer_zero_node, which is redundant.
The other is vint8mf8_t __riscv_vle8(vbool64_t vm, const int8_t *rs1, size_t vl).
Additionally, rfn is saved in the non_overloaded_function_table, which is also redundant.

After this patch, both gcc and g++ regiter each rvv overloaded intrinsic once.
Only gcc's rfn will be added to the non_overloaded_function_table.

Passed the rv64gcv regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (function_builder::add_unique_function):
Only register overloaded intrinsic for g++.
Only insert non_overloaded_function_table for gcc.
(function_builder::add_overloaded_function): Only register overloaded intrinsic for gcc.
(handle_pragma_vector): Only initialize non_overloaded_function_table for gcc.

OpenMP: declare variant's append_args + dispatch interop fixes

For 'omp dispatch interop(obj)', call omp_get_interop_int to
obtain the device number. Add new error if no device clause is
present if multiple interop-clause list items exist.

Update some vars and continue after appending interop args
instead of restarting to ensure that adjust_args updates
are actually performed.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTRMODE_PTR_INT_PTR): Add.
* gimplify.cc (gimplify_call_expr): Add error for multiple
list items to the OpenMP interop clause if no device clause;
continue instead of restarting after append_args handling.
(gimplify_omp_dispatch): Extract device number from the
single interop-clause list item.
* omp-builtins.def (BUILT_IN_OMP_GET_INTEROP_INT): Add.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTRMODE_PTR_INT_PTR): Add.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/append-args-1.c: Update dg-*.
* c-c++-common/gomp/append-args-3.c: Extend testcase.
* c-c++-common/gomp/dispatch-11.c: Update dg-*.
* c-c++-common/gomp/dispatch-12.c: Update dg-*.
* g++.dg/gomp/append-args-1.C: Update dg-*.

Daily bump.

nvptx: For '-march=sm_52' and higher, default at least to '-mptx=7.3'

PR target/65181
gcc/
* config/nvptx/nvptx.cc (default_ptx_version_option): For
'-march=sm_52' and higher, default at least to '-mptx=7.3'.
* doc/invoke.texi (Nvidia PTX Options): Update '-mptx=[...]'.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_52.c: Adjust.
* gcc.target/nvptx/march-map=sm_53.c: Likewise.
* gcc.target/nvptx/march-map=sm_60.c: Likewise.
* gcc.target/nvptx/march-map=sm_61.c: Likewise.
* gcc.target/nvptx/march-map=sm_62.c: Likewise.
* gcc.target/nvptx/march-map=sm_70.c: Likewise.
* gcc.target/nvptx/march-map=sm_72.c: Likewise.
* gcc.target/nvptx/march-map=sm_75.c: Likewise.
* gcc.target/nvptx/march-map=sm_80.c: Likewise.
* gcc.target/nvptx/march-map=sm_86.c: Likewise.
* gcc.target/nvptx/march-map=sm_87.c: Likewise.
* gcc.target/nvptx/march=sm_52.c: Likewise.
* gcc.target/nvptx/march=sm_53.c: Likewise.
* gcc.target/nvptx/march=sm_70.c: Likewise.
* gcc.target/nvptx/march=sm_75.c: Likewise.
* gcc.target/nvptx/march=sm_80.c: Likewise.
* gcc.target/nvptx/mptx=_.c: Use '-march=sm_89'.

nvptx: Support '-mptx=7.3'

gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_7_3'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_7_3): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'7.3' for 'PTX_VERSION_7_3'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=7.3'.
gcc/testsuite/
* gcc.target/nvptx/mptx=7.3.c: New.

nvptx: Add effective-target 'nvptx_softstack', use for effective-target 'alloca'

..., and thereby making the check for effective-target 'alloca' more explicit.
As of commit 5012919d0bd344ac1888e8e531072f0ccbe24d2c (Subversion r242503)
"nvptx backend prerequisites for OpenMP offloading", the check for
effective-target 'alloca' did "use a compile test"; let's make this more
explicit: supported for '-msoft-stack', not supported otherwise.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_nvptx_softstack): New.
(check_effective_target_alloca) [nvptx]: Use it.
gcc/
* doc/sourcebuild.texi (Effective-Target Keywords): Document
'nvptx_softstack'.

c++: Honor complain in cp_build_function_call_vec for check_function_arguments warnings [PR117825]

The following testcase ICEs due to re-entering diagnostics.
When diagnosing -Wformat-security warning, we try to print instantiation
context, which calls tsubst with tf_none, but that in the end calls
cp_build_function_call_vec which calls check_function_arguments which
diagnoses another warning (again -Wformat-security).

The other check_function_arguments caller, build_over_call, doesn't call
that function if !(complain & tf_warning), so I think the best fix is
to do it the same in cp_build_function_call_vec as well.

2025-01-08 Jakub Jelinek <jakub@redhat.com>

PR c++/117825
* typeck.cc (cp_build_function_call_vec): Don't call
check_function_arguments if complain doesn't have tf_warning bit set.

* g++.dg/warn/pr117825.C: New test.

nvptx: Clarify that the PTX "native" stack pointer is handled implicitly at function level [PR65181]

PR target/65181
gcc/
* config/nvptx/nvptx.h (STACK_SAVEAREA_MODE): '#define'.
* config/nvptx/nvptx.md [!TARGET_SOFT_STACK]
(save_stack_function): 'define_expand'.
(restore_stack_function): Handle '!TARGET_SOFT_STACK'.

nvptx: Handle '__builtin_stack_save()' in a well-behaved way for PTX "native" stacks [PR65181]

PR target/65181
gcc/
* config/nvptx/nvptx.md [!TARGET_SOFT_STACK] (save_stack_block):
'define_expand'.
gcc/testsuite/
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1.c:
Adjust.

nvptx: Add '__builtin_stack_save()', '__builtin_stack_restore()' test case [PR65181]

Documenting the status quo.

PR target/65181
gcc/testsuite/
* gcc.target/nvptx/__builtin_stack_save___builtin_stack_restore-1.c:
Add.

nvptx: Add '__builtin_alloca(0)' test cases [PR65181]

Documenting the status quo. This specific behavior relates to a 1994 change
(Subversion r7229, Git commit 15fc002672d643fd9d93d220027b5cd2aefc632c).
That one, however, isn't to blame here: we'd otherwise of course run into
nvptx' 'sorry, unimplemented: target cannot support alloca'.

PR target/65181
gcc/testsuite/
* gcc.target/nvptx/__builtin_alloca_0-1-O0.c: New.
* gcc.target/nvptx/__builtin_alloca_0-1-O1.c: Likewise.

gcc/configure: Fix check for assembler section merging support on Arm

In 32-bit Arm assembly, the @ character is the start of a comment so
the section type needs to use the % character instead.

configure.ac attempts to account for this difference by doing a second
try when checking the assembler for section merging support.
Unfortunately there is a bug: because the gcc_GAS_CHECK_FEATURE macro
has a call to AC_CACHE_CHECK, it will actually skip the second try
because the gcc_cv_as_shf_merge variable has already been set:

  checking assembler for section merging support... no
  checking assembler for section merging support... (cached) no

Fix by using a separate variable for the second try, as is done in the
check for COMDAT group support.

This problem was noticed because the recent binutils commit
d5cbf916be4a ("gas/ELF: also reject merge entity size being zero") caused
gas to be stricter about mergeable sections without an entity size:

configure:27013: checking assembler for section merging support
configure:27022: /path/to/as   --fatal-warnings -o conftest.o conftest.s >&5
conftest.s: Assembler messages:
conftest.s:1: Warning: invalid merge / string entity size
conftest.s: Error: 1 warning, treating warnings as errors
configure:27025: $? = 1
configure: failed program was
.section .rodata.str, "aMS", @progbits, 1
configure:27036: result: no

In previous versions of gas the conftest.s program above was accepted
and configure detected support for section merging.

See also:
https://linaro.atlassian.net/browse/GNU-1427
https://sourceware.org/bugzilla/show_bug.cgi?id=32491

Tested on armv8l-linux-gnueabihf.

gcc/ChangeLog:
* configure.ac: Fix check for HAVE_GAS_SHF_MERGE on Arm targets.
* configure: Regenerate.

c++: decorate build_nop for -fmem-report

The caller of build_nop seems more interesting than that tiny function
itself.

gcc/cp/ChangeLog:

* cp-tree.h (build_nop): Add CXX_MEM_STAT_INFO.
* typeck.cc (build_nop): Add MEM_STAT_DECL.

c++: add ref checks in conversion code

While looking at another patch I noticed that on a few tests we were doing
nonsensical things like building a reference to a reference. Make sure we
catch that sooner. But let's be friendly in can_convert, since it doesn't
return a conversion that could be wrongly applied to a reference.

gcc/cp/ChangeLog:

* call.cc (implicit_conversion): Check that FROM isn't a reference
if we also got an EXPR argument.
(convert_like_internal): Check that EXPR isn't a reference.
(can_convert_arg): convert_from_reference if needed.

c++: print stub object as std::declval

If the result of build_stub_object gets printed by %E it looks something
like '(A&&)1', which seems confusing. Let's instead print it as
'std::declval<A>()' since that's how the library writes the same idea.

gcc/cp/ChangeLog:

* method.cc (is_stub_object): New.
* cp-tree.h (is_stub_object): Declare.
* error.cc (dump_expr): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-3.C: Update diagnostic.
* g++.dg/gomp/declare-variant-5.C: Likewise.

c++: fix conversion issues

Some issues caught by a check from another patch:

In the convert_like_internal bad_p handling, we are iterating from outside
to inside, so once we recurse into convert_like we need to stop looping.

In build_ramp_function, we're assigning REFERENCE_TYPE things, so we need to
build the assignment directly rather than rely on functions that implement
C++ semantics.

In omp_declare_variant_finalize_one, the parameter object building failed to
handle reference parms, and it seems simpler to just use build_stub_object
like other parts of the compiler.

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Add missing break.
* coroutines.cc (cp_coroutine_transform::build_ramp_function): Build
INIT_EXPR directly.
* decl.cc (omp_declare_variant_finalize_one): Use build_stub_object.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-3.C: Don't depend on expr dump.
* g++.dg/gomp/declare-variant-5.C: Likewise.

nvptx: Add a test case where 'alloca's evaporate [PR65181]

Documenting the status quo.

PR target/65181
gcc/testsuite/
* gcc.target/nvptx/alloca-2-O1.c: New.

nvptx: Add 'sorry, unimplemented: target cannot support alloca' test cases [PR65181]

Documenting the status quo.

PR target/65181
gcc/testsuite/
* gcc.target/nvptx/alloca-1.c: New.
* gcc.target/nvptx/vla-1.c: Likewise.

fortran: Bump MOD_VERSION to "16" [PR118337]

As mentioned in the PR, there is a *.mod incompatibility between GCC 14 and
GCC 15, at least when using iso_c_binding or iso_fortran_env intrinsic
modules, because new entries have been added to those modules in the middle,
causing changes in the constants emitted in the *.mod files.

Also, I fear modules produced with GCC 15 with -funsigned and using UNSIGNED
in the modules will be unreadable by GCC 14.

The following patch just bumps MOD_VERSION for this.

Note, a patch for accepting also MOD_VERSION "15" has been posted
incrementally.

2025-01-08 Jakub Jelinek <jakub@redhat.com>

PR fortran/118337
* module.cc (MOD_VERSION): Bump to "16".

aarch64: Fix overly restrictive sibcall check [PR107102]

aarch64_function_ok_for_sibcall required the caller and callee
to use the same PCS variant. However, it should be enough for the
callee to preserve at least as much register state as the caller;
preserving more state is fine.

ARM_PCS_AAPCS64, ARM_PCS_SIMD, and ARM_PCS_SVE agree on what
GPRs should be preserved. For the others:

- ARM_PCS_AAPCS64 preserves D8-D15
- ARM_PCS_SIMD preserves Q8-Q23
- ARM_PCS_SVE preserves Z8-Z23 + P4-P15

Thus it's ok for something earlier in the list to tail call something
later in the list.

gcc/
PR target/107102
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall): Only
reject calls with different PCSes if the callee clobbers register
state that the caller must preserve.

gcc/testsuite/
PR target/107102
* gcc.target/aarch64/sve/sibcall_1.c: New test.

OpenMP: Skip declare_variant's append_args it not variant substituted

Follow up to r15-6658-gaa688dd6302fd9 that handles the same for
declare_variant's adjust_args.

gcc/ChangeLog:

* gimplify.cc (gimplify_call_expr): Disable variant function's
append_args in 'omp dispatch' when invoking the variant directly
and not through the base function.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/append-args-4.c: New test.
* c-c++-common/gomp/append-args-5.c: New test.

c++: ICE with MODIFY_EXPR in constexpr [PR118169]

Here, cxx_eval_outermost_expression gets a sequence of initialization
statements:

  D.2912.t = TARGET_EXPR <...>;
  TARGET_EXPR <D.2922, 1>;
  D.2922 = 0;

the last of which wasn't converted to void and so we, since r15-6369,
do not take the "if (VOID_TYPE_P (type))" path, and try to set
D.2912 to false.

The last statement comes from build_disable_temp_cleanup.
convert_to_void is typically called from finish_expr_stmt, but we are
adding the cleanup statement via add_stmt which doesn't convert to void.
So I think we can use finish_expr_stmt instead.

PR c++/118169

gcc/cp/ChangeLog:

* typeck2.cc (split_nonconstant_init): Call finish_expr_stmt instead
of add_stmt.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-prvalue2.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

nvptx: Re-enable "Stack alignment causes use of alloca" test cases

These generally PASS nowadays, without requiring 'alloca'.

There were two exceptions: 'gcc.dg/torture/stackalign/pr16660-2.c',
'gcc.dg/torture/stackalign/pr16660-3.c', where variants specifying
'-O0' or '-fpic' FAILed with 'ptxas' of, for example, CUDA 10.0 due to:

nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped

That however is gone with 'ptxas' of, for example, CUDA 11.5 and later.

gcc/testsuite/
* gcc.dg/torture/stackalign/global-1.c: Re-enable for nvptx.
* gcc.dg/torture/stackalign/inline-1.c: Likewise.
* gcc.dg/torture/stackalign/nested-1.c: Likewise.
* gcc.dg/torture/stackalign/nested-2.c: Likewise.
* gcc.dg/torture/stackalign/nested-4.c: Likewise.
* gcc.dg/torture/stackalign/pr16660-1.c: Likewise.
* gcc.dg/torture/stackalign/pr16660-2.c: Likewise.
* gcc.dg/torture/stackalign/pr16660-3.c: Likewise.
* gcc.dg/torture/stackalign/ret-struct-1.c: Likewise.
* gcc.dg/torture/stackalign/struct-1.c: Likewise.

nvptx: Support '-march=sm_37': update '-march-map=sm_50' documentation

Fix-up for recent commit 7151d885c47ec93ba06f52a4be2a19a706f0750e
"nvptx: Support '-march=sm_37'".

gcc/
* doc/invoke.texi (Nvidia PTX Options): Update '-march-map=sm_50'.

libstdc++: Add always_inline to casting/forwarding functions in bits/move.h

libstdc++-v3/ChangeLog:

* include/bits/move.h (__addressof, forward, forward_like, move)
(move_if_noexcept, addressof): Add always_inline attribute.
Replace _GLIBCXX_NODISCARD with [[__nodiscard__]].

libstdc++: Make GDB skip over some library functions [PR118260]

libstdc++-v3/ChangeLog:

PR libstdc++/118260
* python/hook.in: Run 'skip' commands for some simple accessor
functions.

libstdc++: add missing to_underlying to module std [PR106852]

std::to_underlying was missing from the std module introduced in
r15-5366-g7db55c0ba1baaf. This patch adds the missing export for this
utility.

libstdc++-v3/ChangeLog:

PR libstdc++/106852
* src/c++23/std.cc.in (to_underlying): Add.

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>

libstdc++: Use preprocessor conditions in std module [PR118177]

The std-clib.cc module definition file assumes that all names are
available unconditionally, but that's not true for all targets. Use the
same preprocessor conditions as are present in the <cxxx> headers.

A similar change is needed in std.cc.in for the <chrono> features that
depend on the SSO std::string, guarded with a __cpp_lib_chrono value
indicating full C++20 support.

The conditions for <cmath> are omitted from this change, as there are a
large number of them. That probably needs to be fixed.

libstdc++-v3/ChangeLog:

PR libstdc++/118177
* src/c++23/std-clib.cc.in: Use preprocessor conditions for
names which are not always defined.
* src/c++23/std.cc.in: Likewise.

libstdc++: Adjust indentation of new std::span constructor

libstdc++-v3/ChangeLog:

* include/std/span: Fix indentation.

libstdc++: add initializer_list constructor to std::span (P2447R6)

This commit implements P2447R6. The code is straightforward (just one
extra constructor, with constraints and conditional explicit).

I decided to suppress -Winit-list-lifetime because otherwise it would
give too many false positives. The new constructor is meant to be used
as a parameter-passing interface (this is a design choice, see
P2447R6/§2) and, as such, the initializer_list won't dangle despite
GCC's warnings.

The new constructor isn't 100% backwards compatible. A couple of
examples are included in Annex C, but I have also lifted some more
from R4. A new test checks for the old and the new behaviors.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add the new feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/span: Add constructor from initializer_list.
* testsuite/23_containers/span/init_list_cons.cc: New test.
* testsuite/23_containers/span/init_list_cons_neg.cc: New test.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>

libstdc++: Avoid redundant assertions in std::span constructors

Any std::span<T, N> constructor with a runtime length has a precondition
that the length is equal to N (except when N == std::dynamic_extent).

Currently every constructor with a runtime length does:

if constexpr (extent != dynamic_extent)
__glibcxx_assert(n == extent);

We can move those assertions into the __detail::__extent_storage<N>
constructor so they are only done in one place. To avoid checking the
assertions when we have a constant length we can add a second
constructor which is consteval and takes a integral_constant<size_t, N>
argument. The std::span constructors can pass a size_t for runtime
lengths and a std::integral_constant<size_t, N> for constant lengths
that don't need to be checked.

The __detail::__extent_storage<dynamic_extent> specialization only needs
one constructor, as a std::integral_constant<size_t, N> argument can
implicitly convert to size_t.

For the member functions that return a subspan with a constant extent we
return std::span<T,C>(ptr, C) which is redundant in two ways. Repeating
the constant length C when it's already a template argument is
redundant, and using the std::span(T*, size_t) constructor implies a
runtime length which will do a redundant assertion check. Even though
that assertion won't fail and should be optimized away, it's still
unnecessary code that doesn't need to be instantiated and then optimized
away again. We can avoid that by adding a new private constructor that
only takes a pointer (wrapped in a custom tag struct to avoid
accidentally using that constructor) and automatically sets _M_extent to
the correct value.

libstdc++-v3/ChangeLog:

* include/std/span (__detail::__extent_storage): Check
precondition in constructor. Add consteval constructor for valid
lengths and deleted constructor for invalid constant lengths.
Make member functions always_inline.
(__detail::__span_ptr): New class template.
(span): Adjust constructors to use a std::integral_constant
value for constant lengths. Declare all specializations of
std::span as friends.
(span::first<C>, span::last<C>, span::subspan<O,C>): Use new
private constructor.
(span(__span_ptr<T>)): New private constructor for constant
lengths.

libstdc++: Handle errors from strxfrm in std::collate::transform [PR85824]

std::regex builds a cache of equivalence classes by calling
std::regex_traits<char>::transform_primary(c) for every char, which then
calls std::collate<char>::transform which calls strxfrm. On several
targets strxfrm fails for non-ASCII characters. Because strxfrm has no
return value reserved to indicate an error, some implementations return
INT_MAX or SIZE_MAX. This causes std::collate::transform to try to
allocate a huge buffer, which is either very slow or throws
std::bad_alloc. We should check errno after calling strxfrm to detect
errors and then throw a more appropriate exception instead of trying to
allocate a huge buffer.

Unfortunately the std::collate<C>::_M_transform function has a
non-throwing exception specifier, so we can't do the error handling
there.

As well as checking errno, this patch changes std::collate::do_transform
to use __builtin_alloca for small inputs, and to use RAII to deallocate
the buffers used for large inputs.

This change isn't sufficient to fix the three std::regex bugs caused by
the lack of error handling in std::collate::do_transform, we also need
to make std::regex_traits::transform_primary handle exceptions. This
change also attempts to make transform_primary closer to the effects
described in the standard, by not even attempting to use std::collate if
the locale's std::collate facet has been replaced (see PR 118105).
Implementing the correct effects for transform_primary requires RTTI, so
that we don't use some user-defined std::collate facet with unknown
semantics. When -fno-rtti is used transform_primary just returns an
empty string, making equivalence classes unusable in std::basic_regex.
That's not ideal, but I don't have any better ideas.

I'm unsure if std::regex_traits<C>::transform_primary is supposed to
convert the string to lower case or not. The general regex traits
requirements ([re.req] p20) do say "when character case is not
considered" but the specification for the std::regex_traits<char> and
std::regex_traits<wchar_t> specializations ([re.traits] p7) don't say
anything about that.

With the r15-6317-geb339c29ee42aa change, transform_primary is not
called unless the regex actually uses an equivalence class. But using an
equivalence class would still fail (or be incredibly slow) on some
targets. With this commit, equivalence classes should be usable on all
targets, without excessive memory allocations.

Arguably, we should not even try to call transform_primary for any char
values over 127, since they're never valid in locales that use UTF-8 or
7-bit ASCII, and probably for other charsets too. Handling 128
exceptions for every std::regex compilation is very inefficient, but at
least it now works instead of failing with std::bad_alloc, and no longer
allocates 128 x 2GB. Maybe for C++26 we could check the locale's
std::text_encoding and use that to decide whether to cache equivalence
classes for char values over 127.

libstdc++-v3/ChangeLog:

PR libstdc++/85824
PR libstdc++/94409
PR libstdc++/98723
PR libstdc++/118105
* include/bits/locale_classes.tcc (collate::do_transform): Check
errno after calling _M_transform. Use RAII type to manage the
buffer and to restore errno.
* include/bits/regex.h (regex_traits::transform_primary): Handle
exceptions from std::collate::transform and do not try to use
std::collate for user-defined facets.

libstdc++: Fix std::future::wait_until for subsecond negative times [PR118093]

The current check for negative times (i.e. before the epoch) only checks
for a negative number of seconds. For a time 1ms before the epoch the
seconds part will be zero, but the futex syscall will still fail with an
EINVAL error. Extend the check to handle this case.

This change adds a redundant check in the headers too, so that we avoid
even calling into the library for negative times. Both checks can be
marked [[unlikely]]. The check in the headers avoids the cost of
splitting the time into seconds and nanoseconds and then making a PLT
call. The check inside the library matches where we were checking
already, and fixes existing binaries that were compiled against older
headers but use a newer libstdc++.so.6 at runtime.

libstdc++-v3/ChangeLog:

PR libstdc++/118093
* include/bits/atomic_futex.h (_M_load_and_test_until_impl):
Return false for times before the epoch.
* src/c++11/futex.cc (_M_futex_wait_until): Extend check for
negative times to check for subsecond times. Add unlikely
attribute.
(_M_futex_wait_until_steady): Likewise.
* testsuite/30_threads/future/members/118093.cc: New test.