git.ipfire.org Git - thirdparty/gcc.git/log

tailc: Improve tail recursion handling [PR119493]

This is a partial step towards fixing that PR.
For musttail recursive calls which have non-is_gimple_reg_type typed
parameters, the only case we've handled was if the exact parameter
was passed through (perhaps modified, but still the same PARM_DECL).
That isn't necessary, we can copy the argument to the parameter as well
(just need to watch for the use of the parameter in later arguments,
say musttail recursive call which swaps 2 structure arguments).

The patch attempts to play safe and punts if any of the parameters are
addressable (like we do for all normal tail calls and tail recursions,
except for musttail in the posted unreviewed patch).

With this patch (at least when early inlining isn't done on not yet
optimized body) inlining should see already tail recursion optimized
body and will not have problems with SRA breaking musttail.

This version of the patch limits this for musttail tail recursions,
with intent to enable for all tail recursions in GCC 16.

2025-04-01  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
if some arguments don't have is_gimple_reg_type, only punt if they
have non-POD types, or volatile, or addressable or (for now) it is
not a musttail call.  Set tailr_arg_needs_copy in those cases too.
(eliminate_tail_call): Copy call arguments to params if they don't
have is_gimple_reg_type, use temporaries if the argument is used
later.
(tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
tailr_arg_needs_copy parameters.  Formatting fix.

* gcc.dg/pr119493-1.c: New test.

combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
            (set (reg:CCZ 17 flags)
                (compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
                    (const_int 0 [0])))
            (set (reg/v:SI 116 [ e ])
                (neg:SI (reg:SI 104 [ _7 ])))
        ]) "pr119291.c":26:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (ne:DI (reg:CCZ 17 flags)
            (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
            (set (pc)
                (pc))
            (set (reg/v:SI 116 [ e ])
                (const_int 0 [0]))
        ]) "pr119291.c":28:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).

This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.

As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
        (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
     (nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
     (nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).

The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.

Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).

On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> does the comment talk about memory?

I was worried about making too risky changes this late in stage4
(and especially also for backports).  Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).

What we know for sure is:
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).

Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes.  This part didn't change in my patch.

I've only changed those
-         && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+         && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.

The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3.  So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false.  Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch).  In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.

Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.

To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).

So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.

2025-04-01  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.

* gcc.c-torture/execute/pr119291.c: New test.

RISC-V: Tweak testcase for PIE

Linux toolchain may configured with --enable-default-pie, and that will
cause lots of regression test failures because the function name will
append with @plt suffix (e.g. `call foo` become `call foo@plt`), also
some code generation will different due to the code model like the address
generation for global variable, so we may add -fno-pie to those
testcases to prevent that.

We may consider just drop @plt suffix to prevent that at all, because
it's not difference between w/ and w/o @plt suffix, the linker will pick
the right one to do, however it's late stage of GCC development, so just
tweak the testcase should be the best way to do now.

Changes from v1:
- Add more testcase for PIE (from rvv.exp).
- Tweak the rule for match @plt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Tweak testcase for PIE.
* gcc.target/riscv/rv32e_zcmp.c: Likewise.
* gcc.target/riscv/zcmp_stack_alignment.c: Likewise.
* gcc.target/riscv/cm_mv_rv32.c: Likewise.
* gcc.target/riscv/cpymem-64.c: Likewise.
* gcc.target/riscv/fmax-snan.c: Likewise.
* gcc.target/riscv/fmaxf-snan.c: Likewise.
* gcc.target/riscv/fmin-snan.c: Likewise.
* gcc.target/riscv/fminf-snan.c: Likewise.
* gcc.target/riscv/large-model.c: Likewise.
* gcc.target/riscv/predef-1.c: Likewise.
* gcc.target/riscv/predef-4.c: Likewise.
* gcc.target/riscv/predef-7.c: Likewise.
* gcc.target/riscv/predef-9.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-3.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-4.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-1.c: Likewise.
* gcc.target/riscv/rvv/base/movmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise.
* gcc.target/riscv/rvv/base/spill-9.c: Likewise.
* g++.target/riscv/mv-symbols1.C: Likewise.
* g++.target/riscv/mv-symbols3.C: Likewise.
* g++.target/riscv/mv-symbols4.C: Likewise.
* g++.target/riscv/mv-symbols5.C: Likewise.
* g++.target/riscv/mvc-symbols1.C: Likewise.
* g++.target/riscv/mvc-symbols3.C: Likewise.

tree-optimization/119534 - reject bogus emulated vectorized gather

The following makes sure to reject the attempts to emulate a vector
gather when the discovered index vector type is a vector mask.

PR tree-optimization/119534
* tree-vect-stmts.cc (get_load_store_type): Reject
VECTOR_BOOLEAN_TYPE_P offset vector type for emulated gathers.

* gcc.dg/vect/pr119534.c: New testcase.

c++: fix missing lifetime extension [PR119383]

Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category. That fix seems
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.

This patch reverts r15-8011 and instead handles the problem in
build_over_call by calling force_lvalue in the is_really_empty_class
case as well as in the general case.

PR c++/119383

gcc/cp/ChangeLog:

* call.cc (build_over_call): Use force_lvalue to ensure op= returns
an lvalue.
* cp-tree.h (force_lvalue): Declare.
* cvt.cc (force_lvalue): New.
* typeck.cc (cp_build_indirect_ref_1): Revert r15-8011.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/temp-extend3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Doc: -Wzero-as-null-pointer-constant is also available for C [PR119173]

The warning -Wzero-as-null-pointer-constant is now not only supported
in C++ but also in C. Change the documentation accordingly.

PR c/119173

gcc/ChangeLog:
* doc/invoke.texi (Warning Options): Move to general options.

profile: Another profiling musttail call fix [PR119535]

As the following testcase shows, EDGE_FAKE edges from musttail calls to
EXIT aren't the only edges we should ignore, we need to ignore also
edges created by the splitting of blocks for the EDGE_FAKE creation that
point from the musttail calls to the fallthrough block, which typically does
the return or with PHIs for the return value.

2025-04-01 Jakub Jelinek <jakub@redhat.com>

PR gcov-profile/119535
* profile.cc (branch_prob): Ignore any edges from bbs ending with
musttail call, rather than only EDGE_FAKE edges from those to EXIT.

* c-c++-common/pr119535.c: New test.

tailr: Punt on tail recursions that would break musttail [PR119493]

While working on the previous tailc patch, I've noticed the following
problem.
The testcase below fails, because we decide to tail recursion optimize
the call, but tail recursion (as documented in tree-tailcall.cc) needs to
add some result multiplication and/or addition if any tail recursion uses
accumulator, which is added right before the return.
So, if there are musttail non-recurive calls in the function, successful
tail recursion optimization will mean we'll later error on the musttail
calls. musttail recursive calls are ok, those would be tail recursion
optimized.

So, the following patch punts on all tail recursion optimizations if it
needs accumulators (add and/or mult) if there is at least one non-recursive
musttail call.

2025-04-01 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/119493
* tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion
candidates which need accumulators if there is at least one musttail
non-recursive call.

* gcc.dg/pr119493-2.c: New test.

gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]

The following testcases ICE on invalid code which defines
labels inside of statement expressions and then uses &&label
from code outside of the statement expressions.
The C++ FE diagnoses that with a warning (not specifically for
assume attribute, genericallly about taking address of a label
outside of a statement expression so computed goto could violate
the requirement that statement expression is not entered from
outside of it through a jump into it), the C FE doesn't diagnose
anything.
Normal direct gotos to such labels are diagnosed by both C and C++.
In the assume attribute case it is actually worse than for
addresses of labels in normal statement expressions, in that case
the labels are still in the current function, so invalid program
can still jump to those (and in case of OpenMP/OpenACC where it
is also invalid and stuff is moved to a separate function, such
movement is done post cfg discovery of FORCED_LABELs and worst
case one can run into cases which fail to assemble, but I haven't
succeeded in creating ICE for that).
For assume at -O0 we'd just throw away the assume expression if
it is not a simple condition and so the label is then not defined
anywhere and we ICE during cfg pass.
The gimplify.cc hunks fix that, as we don't have FORCED_LABELs
discovery done yet, it preserves all assume expressions which contain
used user labels.
With that we ICE during IRA, which is upset about an indirect jump
to a label which doesn't exist.
So, the gimple-low.cc hunks add diagnostics of the problem, it gathers
uids of all the user used labels inside of the assume expressions (usually
none) and if it finds any, walks the IL to find uses of those from outside
of those expressions now outlined into separate magic functions.

2025-04-01 Jakub Jelinek <jakub@redhat.com>

PR middle-end/119537
* gimplify.cc (find_used_user_labels): New function.
(gimplify_call_expr): Don't remove complex assume expression at -O0
if it defines any user labels.
* gimple-low.cc: Include diagnostic-core.h.
(assume_labels): New variable.
(diagnose_assume_labels): New function.
(lower_function_body): Call it via walk_gimple_seq if assume_labels
is non-NULL, then BITMAP_FREE assume_labels.
(find_assumption_locals_r): Record in assume_labels uids of user
labels defined in assume attribute expressions.

* c-c++-common/pr119537-1.c: New test.
* c-c++-common/pr119537-2.c: New test.

GCN: Don't emit weak undefined symbols [PR119369]

This resolves all instances of PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'";
for all affected test cases, the execution test status progresses FAIL -> PASS.

This however also causes a small number of (expected) regressions, very similar
to GCC/nvptx:

    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++26 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++98 (test for excess errors)

    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++11 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++26 (test for excess errors)

    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++17  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++26  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++26 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++98  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++98 (test for excess errors)

    [-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors)
    [-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+}

    @@ -131211,25 +131211,25 @@ PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?c
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?d
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?e
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?g
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?j
    PASS: gcc.dg/weak/weak-1.c scan-assembler-not weak[^ \t]*[ \t]_?i

    PASS: gcc.dg/weak/weak-12.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-12.c scan-assembler weak[^ \t]*[ \t]_?foo

    PASS: gcc.dg/weak/weak-15.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?d
    PASS: gcc.dg/weak/weak-15.c scan-assembler-not weak[^ \t]*[ \t]_?b

    PASS: gcc.dg/weak/weak-16.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_index
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_table
    PASS: gcc.dg/weak/weak-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
    PASS: gcc.dg/weak/weak-2.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d

    PASS: gcc.dg/weak/weak-3.c  (test for warnings, line 58)
    PASS: gcc.dg/weak/weak-3.c  (test for warnings, line 73)
    PASS: gcc.dg/weak/weak-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
    PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1f
    PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1g
    PASS: gcc.dg/weak/weak-3.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d

    PASS: gcc.dg/weak/weak-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
    @@ -131267,16 +131267,16 @@ PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k

    PASS: gcc.dg/weak/weak-5.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1g
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1h
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1l

These get 'dg-xfail-if'ed or 'dg-skip-if'ed, (mostly) similar to GCC/nvptx.

PR target/119369
gcc/
* config/gcn/gcn-protos.h (gcn_asm_weaken_decl): Declare.
* config/gcn/gcn.cc (gcn_asm_weaken_decl): New.
* config/gcn/gcn-hsa.h (ASM_WEAKEN_DECL): '#define' to this.
gcc/testsuite/
* g++.dg/abi/pure-virtual1.C: 'dg-xfail-if' GCN.
* g++.dg/cpp0x/pr84497.C: 'dg-skip-if' GCN.
* g++.dg/ext/weak2.C: Likewise.
* gcc.dg/attr-weakref-1.c: Likewise.
* gcc.dg/weak/weak-1.c: Likewise.
* gcc.dg/weak/weak-12.c: Likewise.
* gcc.dg/weak/weak-15.c: Likewise.
* gcc.dg/weak/weak-16.c: Likewise.
* gcc.dg/weak/weak-2.c: Likewise.
* gcc.dg/weak/weak-3.c: Likewise.
* gcc.dg/weak/weak-4.c: Likewise.
* gcc.dg/weak/weak-5.c: Likewise.

GCN, libstdc++: '#define _GLIBCXX_USE_WEAK_REF 0' [PR119369]

This fixes a few hundreds of compilation/linking FAILs (similar to PR69506),
where the GCN/LLVM 'ld' reported:

    ld: error: relocation R_AMDGPU_REL32_LO cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
    >>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
    >>> referenced by cow-stdexcept.cc:259 ([...]/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
    >>>               cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a

    ld: error: relocation R_AMDGPU_REL32_HI cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
    >>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
    >>> referenced by cow-stdexcept.cc:259 ([...]/source-gcc/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
    >>>               cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a

    [...]

..., which is:

    $ c++filt _ZGTtnam
    transaction clone for operator new[](unsigned long)

..., and similarly for other libitm symbols.

However, the affected test cases, if applicable, then run into execution test
FAILs, due to PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'".

PR target/119369
libstdc++-v3/
* config/cpu/gcn/cpu_defines.h: New.
* configure.host [GCN] (cpu_defines_dir): Point to it.

target/119549 - fixup handling of -mno-sse4 in target attribute

The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.

PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.

* gcc.target/i386/pr119549.c: New testcase.

OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559]

gcc/ChangeLog:

PR middle-end/119559
* gimplify.cc (modify_call_for_omp_dispatch): Reorder checks to avoid
asserts and bogus diagnostic.

libquadmath: Avoid old-style function definition warnings

I've noticed
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/strtod/strtod_l.c:456:22: warning: old-style function definition [-Wold-style-definition]
warnings during bootstrap (clearly since the switch to -std=gnu23 by default).

The following patch fixes those in libquadmath, the only other warnings are
in zlib.

2025-04-01 Jakub Jelinek <jakub@redhat.com>

* strtod/strtod_l.c (____STRTOF_INTERNAL): Avoid old-style function
definitions.
* printf/addmul_1.c (mpn_addmul_1): Likewise.
* printf/mul_1.c (mpn_mul_1): Likewise.
* printf/submul_1.c (mpn_submul_1): Likewise.

RISC-V: testsuite: Fix broken testsuite error of zicbop

Fix broken testsuite like
"ERROR: gcc.target/riscv/cmo-zicbop-2.c -Os : 1: too many arguments for " dg-do 1 compile target { { rv32-*-*}} "

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Fix missing { before target .
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/prefetch-zicbop.c:Likewise.
* gcc.target/riscv/prefetch-zihintntl.c:Likewise.

i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]

For vaes patterns with jm constraint and gpr16 attr, it requires "isa"
attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class.
Also adds missing type and mode attributes for those vaes patterns.

gcc/ChangeLog:

PR target/119473
* config/i386/sse.md
(vaesdec_<mode>): Set attr "isa" as "avx,vaes_avx512vl", "type" as
"sselog1", "mode" as "TI".
(vaesdeclast_<mode>): Ditto.
(vaesenc_<mode>): Ditto.
(vaesenclast_<mode>): Ditto.

gcc/testsuite/ChangeLog:

PR target/119473
* gcc.target/i386/pr119473.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>

RISC-V: Fix wrong LMUL when only implict zve32f.

According to Section 3.4.2, Vector Register Grouping, in the RISC-V
Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN

Changes since V2:
- Add check on vector-iterators.md
- Add one more testcase to check the VLS use correct mode.

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Add restrict for insert LMUL.
* config/riscv/riscv-vector-builtins-types.def:
Use RVV_REQUIRE_ELEN_64 to check LMUL number.
* config/riscv/riscv-vector-switch.def: Likewise.
* config/riscv/vector-iterators.md: Check TARGET_VECTOR_ELEN_64
rather than "TARGET_MIN_VLEN > 32" for all iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111391-2.c: Update test.
* gcc.target/riscv/rvv/base/abi-14.c: Update test.
* gcc.target/riscv/rvv/base/abi-16.c: Update test.
* gcc.target/riscv/rvv/base/abi-18.c: Update test.
* gcc.target/riscv/rvv/base/vsetvl_zve32-1.c: New test.
* gcc.target/riscv/rvv/base/vsetvl_zve32-2.c: New test.

Co-authored-by: Kito Cheng <kito.cheng@sifive.com>

LoongArch: doc: Put the '-mtls-dialect=opt' option description in the correct position.

gcc/ChangeLog:

* doc/invoke.texi: Corrected the position of '-mtls-dialect=opt'
option.

Daily bump.

MAINTAINERS: Update my name

ChangeLog:

* MAINTAINERS: Update my name.

libstdc++: Fix -Warray-bounds warning in std::vector<bool> [PR110498]

In this case, we need to tell the compiler that the current size is not
larger than the new size so that all the existing elements can be copied
to the new storage. This avoids bogus warnings about overflowing the new
storage when the compiler can't tell that that cannot happen.

We might as well also hoist the loads of begin() and end() before the
allocation too. All callers will have loaded at least begin() before
calling _M_reallocate.

libstdc++-v3/ChangeLog:

PR libstdc++/110498
* include/bits/vector.tcc (vector<bool, A>::_M_reallocate):
Hoist loads of begin() and end() before allocation and use them
to state an unreachable condition.
* testsuite/23_containers/vector/bool/capacity/110498.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Fix -Wstringop-overread warning in std::vector<bool> [PR114758]

As in r13-4393-gcca06f0d6d76b0 and a few other commits, we can avoid
bogus warnings in std::vector<bool> by hoisting some loads to before the
allocation that calls operator new. This means that the compiler has
enough info to remove the dead branches that trigger bogus warnings.

On trunk this is only needed with -fno-assume-sane-operators-new-delete
but it will help on the branches where that option doesn't exist.

libstdc++-v3/ChangeLog:

PR libstdc++/114758
* include/bits/vector.tcc (vector<bool, A>::_M_fill_insert):
Hoist loads of begin() and end() before allocation.
* testsuite/23_containers/vector/bool/capacity/114758.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]

In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro
AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname
object if tm.tm_zone isn't supported. That link-test isn't allowed for
cross-compilation, so bootstrap fails if tm.tm_zone isn't supported.

Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway,
we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom
macro that only checks for tm.tm_zone. We can improve on the Autoconf
macro by checking it's a suitable type, which isn't actually checked by
AC_STRUCT_TIMEZONE.

libstdc++-v3/ChangeLog:

PR libstdc++/119550
* acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE.
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Check
_GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of
_GLIBCXX_HAVE_STRUCT_TM_TM_ZONE.

Update gcc sv.po

* sv.po: Update.

gccrs: Fix SEGV when type path resolver fails outright

When we resolve paths we resolve to Types first we walk each segment to
the last module which has no type but then in the event that the child
of a module is not found we have a null root_tyty which needs to be caught
and turned into an ErrorType node.

Fixes Rust-GCC#3613

gcc/rust/ChangeLog:

* typecheck/rust-hir-type-check-type.cc (TypeCheckType::resolve_root_path):
catch nullptr root_tyty

gcc/testsuite/ChangeLog:

* rust/compile/issue-3613.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: fix crash in parse repr options and missing delete call

Fixes Rust-GCC#3606

gcc/rust/ChangeLog:

* typecheck/rust-hir-type-check-base.cc (TypeCheckBase::parse_repr_options):
check for null and empty and add missing delete call

gcc/testsuite/ChangeLog:

* rust/compile/issue-3606.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: fix ice when setting up regions

num regions is based on the used arguments of regions which can be
less than the substutions requirements. So lets check for that and allow
anon regions to be created for them.

Fixes Rust-GCC#3605

gcc/rust/ChangeLog:

* typecheck/rust-tyty-subst.h: check for min range

gcc/testsuite/ChangeLog:

* rust/compile/issue-3605.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: FIX ICE for malformed repr attribute

Fixes Rust-GCC#3614

gcc/rust/ChangeLog:

* typecheck/rust-hir-type-check-base.cc (TypeCheckBase::parse_repr_options): check for input

gcc/testsuite/ChangeLog:

* rust/compile/issue-3614.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: FIX ICE when working with HIR::BareFunctionType

Fixes Rust-GCC#3615

gcc/rust/ChangeLog:

* hir/rust-hir-dump.cc (Dump::visit): check has type
* hir/tree/rust-hir-type.cc (BareFunctionType::BareFunctionType): likewise

gcc/testsuite/ChangeLog:

* rust/compile/issue-3615.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE in array ref constexpr

Since 898d55ad7e2 was fixed to remove the VIEW_CONVERT_EXPR from
array expressions we can now turn on the array element access
const expr.

Fixes Rust-GCC#3563

gcc/rust/ChangeLog:

* backend/rust-constexpr.cc (eval_store_expression): turn this back on

gcc/testsuite/ChangeLog:

* rust/compile/issue-3563.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Add ending newline to rust-macro-builtins-log-debug.cc

gcc/rust/ChangeLog:

* expand/rust-macro-builtins-log-debug.cc:
Add newline to end of file.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

gccrs: nr2.0: Rename prelude to lang_prelude

gcc/rust/ChangeLog:

* resolve/rust-forever-stack.h
(ForeverStack::get_prelude): Rename to...
(ForeverStack::get_lang_prelude): ...here.
(ForeverStack::prelude): Rename to...
(ForeverStack::lang_prelude): ...here.
(ForeverStack::ForeverStack): Handle renames.
* resolve/rust-forever-stack.hxx
(ForeverStack::push_inner): Likewise.
(ForeverStack::resolve_segments): Likewise.
(ForeverStack::resolve_path): Likewise.
(ForeverStack::get_prelude): Rename to...
(ForeverStack::get_lang_prelude): ...here and handle renames.
* resolve/rust-late-name-resolver-2.0.cc
(Late::visit): Handle renames.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

gccrs: nr2.0: Fix test macros/mbe/macro43.rs

gcc/testsuite/ChangeLog:

* rust/compile/macros/mbe/macro43.rs: Adjust test to pass with
name resolution 2.0.
* rust/compile/nr2/exclude: Remove macros/mbe/macro43.rs.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

gccrs: Fix ICE during const expr eval on array expressions

Array expressions are still getting turned into VIEW_CONVERT_EXPR's becuase
TYPE_MAIN_VARIANT is not set so then we might as well reuse the type-hasher
to sort this out.

Fixes Rust-GCC#3588

gcc/rust/ChangeLog:

* backend/rust-compile-context.h: only push named types
* backend/rust-compile-type.cc (TyTyResolveCompile::visit): run the type hasher

gcc/testsuite/ChangeLog:

* rust/compile/issue-3588.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE when compiling path which resolves to trait constant

Fixes Rust-GCC#3552

gcc/rust/ChangeLog:

* backend/rust-compile-resolve-path.cc (HIRCompileBase::query_compile): check for Expr trait
* hir/rust-hir-dump.cc (Dump::visit): expr is optional

gcc/testsuite/ChangeLog:

* rust/compile/issue-3552.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Add new test to highlight namespace for self import

gcc/testsuite/ChangeLog:

* rust/compile/self_import_namespace.rs: New test.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: Update exclusion list

gcc/testsuite/ChangeLog:

* rust/compile/nr2/exclude: Remove now passing tests from exclusion
list.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: Resolve module final self segment in use decls

Lowercase self suffix with path was not resolved properly, this should
point to the module right before.

gcc/rust/ChangeLog:

* resolve/rust-forever-stack.hxx: Add a new specialized function
to retrieve the last "real" segment depending on the namespace.
* resolve/rust-forever-stack.h: Add new function prototype.
* resolve/rust-early-name-resolver-2.0.cc (Early::finalize_rebind_import):
Set declared name according to the selected segment, if there is a self
suffix in the use declaration then select the previous segment.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: Give the builtin unit struct an actual locus

This has been a pet peeve of mine for a while because the gimple never
emitted the struct () name properly it was always empty which for record
types they always require a real locus or they dont get a proper name.

gcc/rust/ChangeLog:

* backend/rust-compile-base.cc (HIRCompileBase::unit_expression): pass ctx
* backend/rust-compile-base.h: cant be static
* backend/rust-compile-intrinsic.cc (try_handler_inner): pass ctx
* backend/rust-compile-type.cc
(TyTyResolveCompile::get_unit_type): update to grab the first locus
(TyTyResolveCompile::visit): pass ctx
* backend/rust-compile-type.h: likewise

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE when doing method resolution on trait predicates

We need to ensure we are adding methods to the possible candidates.

Fixes Rust-GCC#3554

gcc/rust/ChangeLog:

* typecheck/rust-hir-dot-operator.cc:

gcc/testsuite/ChangeLog:

* rust/compile/issue-3554-1.rs: New test.
* rust/compile/issue-3554-2.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE when using super mid way though path

Fixes Rust-GCC#3568

gcc/rust/ChangeLog:

* resolve/rust-ast-resolve-path.cc (ResolvePath::resolve_path): check for super mid path

gcc/testsuite/ChangeLog:

* rust/compile/nr2/exclude: nr2 puts out a different error multiple times
* rust/compile/issue-3568.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE when compiling block expressions in array capacity

We need to reuse the existing compile_constant_item helper which handles
the case if this is a simple expression, fn-call or a block expression.
The patch extracts out this helper as a static method so this can be used
in more places.

Fixes Rust-GCC#3566

gcc/rust/ChangeLog:

* backend/rust-compile-base.cc (HIRCompileBase::address_expression): new helper constexpr
* backend/rust-compile-base.h: prototype
* backend/rust-compile-type.cc (TyTyResolveCompile::visit): call constexpr helper

gcc/testsuite/ChangeLog:

* rust/compile/issue-3566-1.rs: New test.
* rust/compile/issue-3566-2.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Add check for super traits being implemented by Self

We need to recursively check the super traits of the predicate the Self
type is trying to implement. Otherwise its cannot implement it.

Fixes Rust-GCC#3553

gcc/rust/ChangeLog:

* typecheck/rust-hir-type-check-item.cc (TypeCheckItem::resolve_impl_block_substitutions):
Track the polarity
* typecheck/rust-tyty-bounds.cc (TypeBoundPredicate::validate_type_implements_this):
new validator
* typecheck/rust-tyty.h: new prototypes

gcc/testsuite/ChangeLog:

* rust/compile/issue-3553.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix ICE when array elements are not a value

We need to check for error_mark_node when doing adjustments from coercion
sites otherwise we hit assetions as part of the coercion. That fixes the
ICE but the reason for the error_mark_node is because the array element
value.

Fixes Rust-GCC#3567

gcc/rust/ChangeLog:

* backend/rust-compile-expr.cc (CompileExpr::array_value_expr): add value chk for array expr

gcc/testsuite/ChangeLog:

* rust/compile/issue-3567.rs: New test.

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix core library test with proper canonical path

Import from core library was wrong, it misses several crate directives
since we're no longer dealing with multiple files.

gcc/testsuite/ChangeLog:

* rust/compile/issue-2905-2.rs: Import from core library into a single
file misses the crate directives.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: fix unconstrained infer vars on generic associated type

The trick here is that when Bar::test is resolved it resolves to the
trait method:

  fn <Bar<i32>, T> (placeholder) -> placeholder

Which is fine so we need to setup the associated types for Bar<i32> which
means looking up the associated impl block then setting up the projection
of A = T so it becomes:

  fn <Bar<i32>, T> (placeholder: projection<T>:T)
    -> placeholder: projection<T>:T

But previously it was auto injecting inference variables so it became:

  fn <Bar<i32>, T> (placeholder: projection<T>:?T)
    -> placeholder: projection<T>:?T

The issue is that the binding of the generics was still T so this caused
inference variables to be injected again but unlinked. A possible tweak
would be that we are substituting again with new infer vars to actually
just unify them enplace so they are all part of the chain. This still
might be needed but lets hold off for now.

So basically when we are Path probing we dont allow GAT's to generate new
inference vars because they wont be bound to this current segment which
just causes confusion.

Fixes Rust-GCC#3242

gcc/rust/ChangeLog:

* typecheck/rust-hir-trait-reference.h: add default infer arg
* typecheck/rust-hir-trait-resolve.cc: dont add new infer vars
* typecheck/rust-hir-type-check-path.cc (TypeCheckExpr::resolve_segments): dont infer

gcc/testsuite/ChangeLog:

* rust/compile/issue-3242.rs: no longer skip the test

Signed-off-by: Philip Herron <herron.philip@googlemail.com>

gccrs: Fix validation of constant items

gcc/rust/ChangeLog:

* checks/errors/rust-ast-validation.cc
(ASTValidation::visit): Allow constant items lacking expressions
if and only if they're associated with a trait definition, not a
trait implementation.

gcc/testsuite/ChangeLog:

* rust/compile/issue-3541-1.rs: New test.
* rust/compile/issue-3541-2.rs: Likewise.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

gccrs: testsuite: Add more testcases for cfg() in core

gcc/testsuite/ChangeLog:

* rust/compile/cfg-core1.rs: New test.
* rust/compile/cfg-core2.rs: New test.

gccrs: Lower raw string literals

gcc/rust/ChangeLog:

* hir/rust-ast-lower-base.cc
(ASTLoweringBase::lower_literal): Lower raw string literals into
normal string literals.

gcc/testsuite/ChangeLog:

* rust/compile/issue-3549.rs: New test.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

rust: Lower minimum supported Rust version to 1.49

gcc/rust/ChangeLog:

* checks/errors/borrowck/ffi-polonius/Cargo.lock: Regenerate.
* checks/errors/borrowck/ffi-polonius/Cargo.toml: Update to use source patching instead of
vendoring, lower edition to 2018.
* checks/errors/borrowck/ffi-polonius/vendor/log/Cargo.toml: Change edition to 2018.
* checks/errors/borrowck/ffi-polonius/vendor/log/src/lib.rs: Remove uses of unstable
feature.
* checks/errors/borrowck/ffi-polonius/.cargo/config.toml: Removed.

libgrust/ChangeLog:

* libformat_parser/Makefile.am: Avoid using --config as it is unsupported by cargo 1.49.
* libformat_parser/Makefile.in: Regenerate.
* libformat_parser/generic_format_parser/src/lib.rs: Use extension trait for missing
features.
* libformat_parser/src/lib.rs: Likewise.
* libformat_parser/.cargo/config: Moved to...
* libformat_parser/.cargo/config.toml: ...here.

gccrs: nr2.0: Fix test const_generics_3.rs

gcc/testsuite/ChangeLog:

* rust/compile/const_generics_3.rs: Modify test to run with name
resolution 2.0 only and to handle the absence of a bogus
resolution error.
* rust/compile/nr2/exclude: Remove const_generics_3.rs.

Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>

gccrs: lower: Handle let-else properly

gcc/rust/ChangeLog:

* hir/tree/rust-hir-stmt.h (class LetStmt): Add optional diverging else expression.
* hir/tree/rust-hir-stmt.cc: Likewise.
* hir/rust-ast-lower-stmt.cc (ASTLoweringStmt::visit): Add handling for lowering
diverging else.

gccrs: name-resolution: Handle let-else properly

gcc/rust/ChangeLog:

* resolve/rust-ast-resolve-stmt.h: Add handling for diverging else.
* resolve/rust-late-name-resolver-2.0.cc (Late::visit): Likewise.

gccrs: dump: Handle let-else properly

gcc/rust/ChangeLog:

* ast/rust-ast-collector.cc (TokenCollector::visit): Add handling for diverging else
expression.

gccrs: parser: Parse let-else statements

gcc/rust/ChangeLog:

* parse/rust-parse-impl.h (Parser::parse_let_stmt): Add new parsing in case of `else` token.

gccrs: ast: Add optional diverging else to AST::LetStmt

gcc/rust/ChangeLog:

* ast/rust-stmt.h (class LetStmt): Add optional expression for diverging else.
* ast/rust-ast-builder.cc (Builder::let): Use new API.

gccrs: Remove now passing test from exclusion list

Those tests were malformed and failed with the new name resolution
because of it.

gcc/testsuite/ChangeLog:

* rust/compile/nr2/exclude: Remove test from exclusion list.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: Fix testcase module path

Those tests are coming from libcore and module inlining was wrong, in
libcore there was a use declaration to import those modules which was
missing here.

gcc/testsuite/ChangeLog:

* rust/compile/issue-2330.rs: Use complete path from crate root.
* rust/compile/issue-1901.rs: Likewise.
* rust/compile/issue-1981.rs: Likewise.
* rust/compile/iterators1.rs: Likewise.
* rust/compile/sizeof-stray-infer-var-bug.rs: Likewise.
* rust/compile/for-loop1.rs: Likewise.
* rust/compile/for-loop2.rs: Likewise.
* rust/compile/torture/builtin_abort.rs: Likewise.
* rust/compile/torture/uninit-intrinsic-1.rs: Likewise.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

gccrs: Fix function name to printf

Function could not be found and triggered an error message.

gcc/testsuite/ChangeLog:

* rust/compile/feature_rust_attri0.rs: Add extern
function declaration and change name to printf.
* rust/compile/nr2/exclude: Remove now passing test from exclusion
list.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

d: Bump front-end language version to v2.111.0

Merges the front-end language implementation and runtime library with
upstream dmd c6863be720, and the standard library with phobos 60034b56e.

Synchronizing with the upstream release of v2.111.0.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd c6863be720.
* dmd/VERSION: Bump version to v2.111.0.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime c6863be720.
* src/MERGE: Merge upstream phobos 60034b56e.

Only write gcov when file output is on [PR119553]

gcov_write_* functions must be guarded so they only are called when
output_to_file is true, like for -fcondition-coverage, otherwise it
triggers an invalid read as detected by valgrind. The gcno file is
mostly written to from profile.cc, so it doesn't make too much sense
to hide it in path-coverage.cc. The find_paths name was a bit
imprecise, and is renamed to instrument_prime_paths.

PR gcov-profile/119553

gcc/ChangeLog:

* path-coverage.cc (find_paths): Return path count, don't
write to gcno, and rename to ...
(instrument_prime_paths): ... this.
* profile.cc (branch_prob): Write path counts to gcno.

libstdc++: Tweak linker script to avoid conflict on Solaris

The new symbols for the _M_construct<bool> function template match an
existing pattern in the GLIBCXX_3.4.21 version, as well as the intended
pattern in the GLIBCXX_3.4.34 version. That causes a linker error on
Solaris.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Make
std::basic_string::_M_construct patterns more precise.

d: Fix error with -Warray-bounds and -O2 [PR117002]

The record layout of class types in D don't get any tail padding, so it
is possible for the `classInstanceSize' to not be a multiple of the
`classInstanceAlignment'.

Rather than setting the instance alignment on the underlying
RECORD_TYPE, instead give the type an alignment of 1, which will mark it
as TYPE_PACKED. The value of `classInstanceAlignment' is instead
applied to the DECL_ALIGN of both the static `init' symbol, and the
stack allocated variable used when generating `new' for a `scope' class.

PR d/117002

gcc/d/ChangeLog:

* decl.cc (aggregate_initializer_decl): Set explicit decl alignment of
class instance.
* expr.cc (ExprVisitor::visit (NewExp *)): Likewise.
* types.cc (TypeVisitor::visit (TypeClass *)): Mark the record type of
classes as packed.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr117002.d: New test.

libstdc++: Make operator== for std::tuple convert to bool [PR119545]

The boolean-testable requirements don't require the type to be copyable,
so we need to convert to bool before it might need to be copied.

libstdc++-v3/ChangeLog:

PR libstdc++/119545
* include/std/tuple (operator==): Convert comparison results to
bool.
* testsuite/20_util/tuple/comparison_operators/119545.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++: fix reporting routines re-entered [PR119303]

We crash while we call warning_at ("inline function used but never defined")
since it invokes dump_template_bindings -> tsubst -> ... -> convert_like ->
... -> c_common_truthvalue_conversion -> warning_at ("enum constant in boolean
context")

cp_truthvalue_conversion correctly gets complain=0 but it calls
c_common_truthvalue_conversion from c-family which doesn't have
a similar parameter.

We can fix this by tweaking diagnostic_context::report_diagnostic to
check for recursion after checking if the diagnostic was enabled.

PR c++/116960
PR c++/119303

gcc/ChangeLog:

* diagnostic.cc (diagnostic_context::report_diagnostic): Check for
non-zero m_lock later, after checking diagnostic_enabled.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval26.C: New test.
* g++.dg/warn/undefined2.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

aarch64: Remove +sme -> +sve2 feature flag dependency

As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2. However, we don't
support SME without SVE2 and bail out with a 'sorry' if this configuration is
encountered. We may choose to support this in the future.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (SME): Remove SVE2 as
prerequisite and add in FCMA and F16FML.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Diagnose use of SME without SVE2 and implicitly enable SVE2 when
enabling SME after streaming mode diagnosis.
* doc/invoke.texi (sme): Document that this can only be used with the
sve2 extension.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/no-sve-with-sme-1.c: New.
* gcc.target/aarch64/no-sve-with-sme-2.c: New.
* gcc.target/aarch64/no-sve-with-sme-3.c: New.
* gcc.target/aarch64/no-sve-with-sme-4.c: New.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Pass +sve2 to existing
+sme pragma.
* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_int_opt_single_1.c:
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_4.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_uint_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_int_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_uint_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_slice_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_slice_1.c: Likewise.

c++: lambda in function template signature [PR119401]

Here we instantiate the lambda three times in producing A<0>::f:
1) in tsubst_function_type, substituting the type of A<>::f
2) in tsubst_function_decl, substituting the parameters of A<>::f
3) in regenerate_decl_from_template when instantiating A<>::f

The first one gets thrown away by maybe_rebuild_function_decl_type.  Before
r15-7202, we happily built all of them and mangled the result wrongly as
lambda #3.  After r15-7202, we try to mangle #3 as #1, which breaks because
#1 is already mangled as #1.

This patch avoids building #3 by suppressing regenerate_decl_from_template
if the template signature includes a lambda, fixing the ICE.

We now mangle the lambda as #2, which is still wrong.  Addressing that
should involve not calling tsubst_function_type from tsubst_function_decl,
and building the type from the parms types in the first place rather than
fixing it up in maybe_rebuild_function_decl_type.

PR c++/119401

gcc/cp/ChangeLog:

* pt.cc (regenerate_decl_from_template): Don't regenerate if the
signature involves a lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ11.C: New test.

tree-optimization/119532 - ICE with fixed-point tail recursion

The following disables tail recursion optimization when fixed-point
types are involved as we cannot generate -1 for all fixed-point
types.

PR tree-optimization/119532
* tree-tailcall.cc (process_assignment): FAIL for fixed-point
typed functions.

* gcc.dg/torture/pr119532.c: New testcase.

arm: testsuite: fix vect-fmaxmin.c test

This is another case of a test that was both an executable test
requiring specific hardware and an assembler scan test.  The
requirement for the hardware was masking some useful testing that
could be done (by scanning the assembly output) on almost all test
runs.  Fixed in a similar manner to fmaxmin{,-2}.c by splitting the
test into two, one that scans the assembler output and one that
executes the compiled code if suitable hardware is available.

The masked issue was that this test was expecting vectorization to
occur that was incorrect given the options passed.  For correct
vectorization we need -funsafe-math-optimizations as the vector
version of the single-precision operation will apply a truncation of
denormal values.

gcc/testsuite/ChangeLog:

* gcc.target/arm/vect-fmaxmin-2.c: New compile test.  Split from ...
* gcc.target/arm/vect-fmaxmin.c: ... here.  Remove scan-assembler
subtests.  For both, add -funsafe-math-optimizations.

OpenMP: modify_call_for_omp_dispatch - fix invalid memory access after 'error' [PR119541]

OpenMP requires that the number of dispatch 'interop' clauses (ninterop)
is less or equal to the number of declare variant 'append_args' interop
objects (nappend).

While 'nappend < ninterop' was diagnosed as error, the processing continues,
which lead to an invalid out-of-bounds memory access. Solution: only
process the first nappend 'interop' clauses.

gcc/ChangeLog:

PR middle-end/119541
* gimplify.cc (modify_call_for_omp_dispatch): Limit interop claues
processing by the number of append_args arguments.

PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

In this testcase GCC tries to expand a VNx4BI vector:
  vector(4) <signed-boolean:4> _40;
  _39 = (<signed-boolean:4>) _24;
  _40 = {_39, _39, _39, _39};

This ends up in a scalarised sequence of bitfield insert operations.
This is despite the fact that AArch64 provides a vec_duplicate pattern
specifically for vec_duplicate into VNx4BI.

The store_constructor code is overly conservative when trying vec_duplicate
as it sees a requested VNx4BImode and an element mode of QImode, which I guess
is the storage mode of BImode objects.

The vec_duplicate expander in aarch64-sve.md explicitly allows QImode element
modes so it should be safe to use it.  This patch extends that mode check
to allow such expanders.

The testcase is heavily auto-reduced from a real application but in itself is
nonsensical, but it does demonstrate the current problematic codegen.

This the testcase goes from:
        pfalse  p15.b
        str     p15, [sp, #6, mul vl]
        mov     w0, 0
        ldr     w2, [sp, 12]
        bfi     w2, w0, 0, 4
        uxtw    x2, w2
        bfi     w2, w0, 4, 4
        uxtw    x2, w2
        bfi     w2, w0, 8, 4
        uxtw    x2, w2
        bfi     w2, w0, 12, 4
        str     w2, [sp, 12]
        ldr     p15, [sp, #6, mul vl]

into:
        whilelo p15.s, wzr, wzr

The whilelo could be optimised away into a pfalse of course, but the important
part is that the bfis are gones.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

PR middle-end/119442
* expr.cc (store_constructor): Also allow element modes explicitly
accepted by target vec_duplicate pattern.

gcc/testsuite/

PR middle-end/119442
* gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.

libstdc++: Constrain formatters for chrono types [PR119517]

The formatters for chrono types defined the parse/format methods
as accepting unconstrained types, this in combination with lack
of constrain on _CharT lead to them falsely satisfying formattable
requirements for any type used as character.

This patch adjust the fromatter<T, CharT>::parse signature to:
constexpr typename basic_format_parse_context<_CharT>::iterator
parse(basic_format_parse_context<_CharT>& __pc);
And formatter<T, CharT>::format to:
template<typename _Out>
   typename basic_format_context<_Out, _CharT>::iterator
   format(const T& __t,
          basic_format_context<_Out, _CharT>& __fc) const;

Furthermore we _CharT with __format::__char (char or wchar_t),

PR libstdc++/119517

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (formatter):
Add __format::__char for _CharT and adjust parse and format
method signatures.
* testsuite/std/time/format/pr119517.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

gcc_release: Generate srcdir extras/infos/man pages from all FEs [PR119510]

Enabling cobol explicitly (at least unconditionally) in gcc_release has the
disadvantage that the script no longer works for GCC <= 14, I think it would
be better to keep it working for all still supported release branches.

And as mentioned in the PR, we still don't generate the
--enable-generated-files-in-srcdir extras/infos/man pages for languages
not actually enabled.
Using --enable-languages=all would mean gcc_release takes far longer and
more importantly, various FEs have extra dependencies, Ada requires a
working Ada compiler (furthermore not newer than the gcc release, so if
I run this on a system with say GCC 15 installed, even when I have Ada
installed, I won't be able to gcc_release GCC 14 or 13 etc.), D working D
compiler, Go takes a long time to build libgo.

So, the following patch instead takes similar approach to what
make regenerate-opt-urls
takes, it generates stuff even for non-enabled languages.
For most languages it works just fine and is a matter of say for cobol
make cobol.srcextra cobol.srcinfo cobol.srcman
The only problem is Modula 2, which has some messed up dependencies and
when the FE is not enabled, this will try to build the whole FE as well and
fail.  I think it would be useful to fix that but at least before that is
fixed on the trunk and all release branches, the following patch just
conditionally (so that it works even for GCC 12 which doesn't have Modula 2)
enables also m2.

And lastly, libffi seems to be only enabled for Go (and maybe D), I'd prefer
not to enable those languages for the reasons stated above, so if we really
need libffi.info in release tarballs (despite libffi being used only as
implementation detail and not installed), the patch just generates it by
hand.

2025-03-29  Jakub Jelinek  <jakub@redhat.com>

PR other/119510
* gcc_release: Use --enable-languages=c,c++,lto and if m2 is available,
with,m2 appended to that.  Check for all possible languages and run
make $lang.srcextra $lang.srcinfo $lang.srcman for those.  Add
libffi/doc/libffi.info.

target/119010 - add mode attribute to *vmovv16si_constm1_pternlog_false_dep

Like the other instances. This avoids

;; 1--> b 0: i6540 {xmm2=const_vector;unspec[xmm2] 38;} :nothing

PR target/119010
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep):
Add mode attribute.

target/119010 - Zen4/Zen5 reservations for movlhps loads

The following fixes up the ssemov2 type introduction, amending
the znver4_sse_mov_fp_load reservation. This fixes

;; 14--> b 0: i1436 xmm6=vec_concat(xmm6,[ax+0x8]) :nothing

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_mov_fp_load,
znver5_sse_mov_fp_load): Also match ssemov2.

target/119010 - reservations for Zen4/Zen5 movhlps to memory

The following adds missing reservations for the store variant of
sselog reservations covering

;; 112--> b 0: i1499 [dx-0x10]=vec_select(xmm10,parallel) :nothing

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_log_evex_store,
znver5_sse_log_evex_store): New reservations.

target/119010 - fixup Zen4/Zen5 fp<->int convert reservations

They were using ssecvt instead of sseicvt, I've also added handling
for sseicvt2 which was introduced without fixing up automata, and
the relevant instruction uses DFmode. IMO this is a quite messy
area that could need TLC in the machine description itself.

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_icvt): Use sseicvt.
(znver4_sse_icvt_store): Likewise.
(znver5_sse_icvt_store): Likewise.
(znver4_sse_icvt2): New.

target/119010 - handle DFmode in SSE divide reservations for Zen4/Zen5

Like the other DFmode cases.

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_div_pd,
znver4_sse_div_pd_load, znver5_sse_div_pd_load): Handle DFmode.

target/119010 - add reservations for integer vector compares to zen4/zen5

The following handles TI, OI and XI mode in the respective EVEX
compare reservations that do not use memory (I've not yet run into
ones with). The znver automata has separate reservations for
integer compares (but only for zen1, for zen2 and zen3 there are
no compare reservations at all), but I don't see why that should
be necessary here.

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_cmp_avx128,
znver5_sse_cmp_avx128): Handle TImode.
(znver4_sse_cmp_avx256, znver5_sse_cmp_avx256): Handle OImode.
(znver4_sse_cmp_avx512, znver5_sse_cmp_avx512): Handle XImode.

target/119010 - missing reservations for Zen4/5 and SSE compares

There's the znver4_sse_test reservation which matches the memory-less
SSE compares but currently requires prefix_extra == 1. The old
znver automata in this case sometimes uses znver1-double instead of
znver1-direct, but it's quite a maze. The following simply drops
the prefix_extra requirement, but I have no idea what I'm doing here.
There doesn't seem to be any documentation on the scheduler relevant
attributes used, or at least I cannot find that.

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_test): Drop test of
prefix_extra attribute.

target/119010 - fixup zn4zn5 reservation for move from const_vector

movv8si_internal uses sselog1 and V4SFmode for an instruction like

(insn 363 2437 371 97 (set (reg:V8SI 46 xmm10 [1125])
        (const_vector:V8SI [
                (const_int 0 [0]) repeated x8
            ])) "ComputeNonbondedUtil.C":185:21 2402 {movv8si_internal}

this wasn't catched by the existing znver4_sse_log1 reservation,
I think the znver automaton catches this with the generic

(define_insn_reservation "znver1_sse_log1" 1
             (and (eq_attr "cpu" "znver1,znver2,znver3")
                  (and (eq_attr "type" "sselog1")
                   (eq_attr "memory" "none")))
             "znver1-direct,znver1-fp1|znver1-fp2")

which does not look at the mode at all.  The zn4zn5 automaton lacks
this and instead has separated store and load-store reservations
in odd ways.  The following renames the store one and introduces
a none variant.

PR target/119010
* config/i386/zn4zn5.md (znver4_sse_log1): Rename to
znver4_sse_log1_store.
(znver5_sse_log1): Rename to znver5_sse_log1_store.
(znver4_sse_log1): New memory-less variant.

c++: Honor noipa attribute for FE nothrow discovery [PR119518]

The following testcase has different code generation in bar depending on
whether foo is defined or just declared.
That is undesirable when it has noipa attribute, that attribute is
documented to be a black box between caller and callee, so the caller
shouldn't know about any implicitly determined properties of the callee
and callee shouldn't know about its callers.

E.g. the ipa-pure-const passes including nothrow discovery in there all
honor noipa attribute, but the FE did not.

2025-03-31 Jakub Jelinek <jakub@redhat.com>

PR c++/119518
* decl.cc (finish_function): Don't set TREE_NOTHROW for
functions with "noipa" attribute even when we can prove
they can't throw.

* g++.dg/opt/pr119518.C: New test.

libstdc++: Fix up string _M_constructor<bool> exports [PR103827]

On Thu, Mar 27, 2025 at 02:04:24PM +0100, Jan Hubicka wrote:
> Seems I missed the approval, sorry. I will push it - I think it would
> be useful to have it in.

Unfortunately the exports in this patch only work on targets where size_t is
unsigned long, not e.g. on ia32 where it is unsigned int, or targets where
it is unsigned long long.

2025-03-31 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/103827
PR tree-optimization/80331
PR tree-optimization/87502
* config/abi/pre/gnu.ver (GLIBCXX_3.4.34): Use [jmy] rather than m
in pattern for _M_construct<bool>(char const*, size_t).

Daily bump.

Docs: make regenerate-opt-urls

gcc/c-family/ChangeLog
* c.opt.urls: Regenerate.

gcc/d/ChangeLog
* lang.opt.urls: Regenerate.

gcc/m2/ChangeLog
* lang.opt.urls: Regenerate.

Optimize string constructor

this patch improves code generation on string constructors.  We currently have
_M_construct which takes as a parameter two iterators (begin/end pointers to
other string) and produces new string.  This patch adds special case of
constructor where instead of begining/end pointers we readily know the string
size and also special case when we know that source is 0 terminated.  This
happens commonly when producing stirng copies. Moreover currently ipa-prop is
not able to propagate information that beg-end is known constant (copied string
size) which makes it impossible for inliner to spot the common case where
string size is known to be shorter than 15 bytes and fits in local buffer.

Finally I made new constructor inline. Because it is explicitely instantiated
without C++20 constexpr we do not produce implicit instantiation (as required
by standard) which prevents inlining, ipa-modref and any other IPA analysis to
happen.  I think we need to make many of the other functions inline, since
optimization accross string manipulation is quite important. There is PR94960
to track this issue.

Bootstrapped/regtested x86_64-linux, OK?

libstdc++-v3/ChangeLog:

PR tree-optimization/103827
PR tree-optimization/80331
PR tree-optimization/87502

* config/abi/pre/gnu.ver: Add version for _M_construct<bool>
* include/bits/basic_string.h: (basic_string::_M_construct<bool>): Declare.
(basic_string constructors): Use it.
* include/bits/basic_string.tcc: (basic_string::_M_construct<bool>): New template.
* src/c++11/string-inst.cc: Instantated S::_M_construct<bool>.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr80331.C: New test.
* g++.dg/tree-ssa/pr87502.C: New test.

Doc: Clean up New/Delete Builtins manual section

I noticed that the "New/Delete Builtins" section failed to explicitly
name or describe the arguments of the builtin functions it purported
to document, outside of using them in an example. I've fixed that
and cleaned up the whole section.

gcc/ChangeLog
* doc/extend.texi (New/Delete Builtins): Cleanup up the text and
explicitly list the builtins being documented.

Doc: Move Integer Overflow Builtins section [PR42270]

This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (Numeric Builtins): Move Integer Overflow Builtins
section here, as a subsection.

Doc: Organize atomic memory builtins documentation [PR42270]

This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.

This installment adds a container section to hold documentation for
both the _atomic and _sync builtins, reordering them so that the new
_atomic interface is presented before the legacy _sync one. I also
incorporated material from the separate x86 transactional memory
section directly into the __atomic builtins documentation instead of
retaining that as a parallel section.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (Atomic Memory Access): New section.
(__sync Builtins): Make it a subsection of the above.
(Atomic Memory Access): Likewise.
(x86 specific memory model extensions for transactional memory):
Delete this section, incorporating the text into the discussion
of __atomic builtins.

Doc: Break up and rearrange the "other builtins" section [PR42270]

This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.

The "Other Builtins" section had become a catch-all for all sorts of
things with very little organization or attempt to differentiate between
important information (e.g., GCC treats a gazillion library functions as
builtins by default) from obscure builtins provided primarily as internal
interfaces.  I've split it up into various pieces and attempted to move
the more important or useful-to-users documentation earlier in the chapter.
What's left of the section is still a jumbled mess...  but at least it's a
smaller jumbled mess.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (Built-in Functions): Incorporate some text
formerly in "Other Builtins" into the introduction.  Adjust
menu for new sections.
(Library Builtins): New section, split from "Other Builtins".
(Numeric Builtins): Likewise.
(Stack Allocation): Likewise.
(Constructing Calls): Move __builtin_call_with_static_chain here.
(Object Size Checking): Minor copy-editing.
(Other Builtins): Move text to new sections listed above.  Delete
duplicate docs for object-size checking builtins.
* doc/invoke.texi (C dialect options): Update @xref for -fno-builtin.

Doc: Move builtin documentation to a new chapter [PR42270]

This is part of an incremental effort to make the documentation for
GCC extensions better organized by grouping/rearranging sections by
topic.

I was originally intending to consolidate all the sections documenting
builtins as subsections of a new container section within the C
extensions chapter, but I ran into a technical limitation of Texinfo:
it only supports sectioning depth up to @subsubsection, and we already
had quite a few of those in the target-specific builtins sections. So
instead I have pulled all the existing sections out into a new
chapter. This actually makes sense since some of the builtins are
specific to C++ anyway and are not C language extensions at all.

Subsequent patches in this series will move things around within the
new chapter; this one just adds the new container node and adjusts
the menus.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (C Extensions): Move menu items for
builtin-related sections to...
(Built-in Functions): New chapter.
* doc/gcc.texi (Introduction): Add menu entry for new chapter.

Doc: Add a container section to consolidate attribute documentation [PR42270]

    This is part of an incremental effort to make the chapter on GCC
    extensions better organized by grouping/rearranging sections by topic.
    Note that this patch does not address the restructuring/rewrite
    suggested by PR88472 or PR102397, beyond adding a very short
    introduction to the new container section that is more explicit about
    both syntaxes being accepted as a GNU extension.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (Attributes): New section.
(Function Attributes): Make it a subsection of the new section.
(Variable Attributes): Likewise.
(Type Attributes): Likewise.
(Label Attributes): Likewise.
(Enumerator Attributes): Likewise.
(Attribute Syntax): Likewise.

Doc: Remove separate "Target Format Checks" section [PR42270]

This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.

Following the last round of patches, there's a leftover section
"Target Format Checks" that didn't fit into any category. It seems best
to merge this material into the main discussion of the "format" attribute,
in particular because that discussion already contains similar discussion
for mingw/Windows targets.

gcc/ChangeLog
PR other/42270
* doc/extend.texi (Function Attributes): Merge text from "Target
Format Checks" into the main discussion of the format and
format_arg attributes.
(Target Format Checks): Delete section.

testsuite: Fix up atomic-inst-ldlogic.c

r15-8956 changed in the test:
-/* { dg-final { scan-assembler-times "ldclr\t" 16} */
+/* { dg-final { scan-assembler-times "ldclr\t" 16 } */
which made it even worse than before, when the directive has
been silently ignored because it didn't match the regex for
directives.  Now it matches it but is unbalanced.

The following patch fixes it and adds space after all the
other scan-assembler-times counts in the file.

2025-03-30  Jakub Jelinek  <jakub@redhat.com>

* gcc.target/aarch64/atomic-inst-ldlogic.c: Fix another
unbalanced {} directive problem.  Add space after all
scan-assembler-times counts.

aarch64: Changed CRC test.

Fixed the iteration number in crc-crc32c-data16.c test from 8 to 16 to match the test name.

gcc/testsuite
* gcc.target/aarch64/crc-crc32c-data16.c: Fix iteration
count to match testname.

Alpha: Add option to avoid data races for partial writes [PR117759]

Similarly to data races with 8-bit byte or 16-bit word quantity memory
writes on non-BWX Alpha implementations we have the same problem even on
BWX implementations with partial memory writes produced for unaligned
stores as well as block memory move and clear operations.  This happens
at the boundaries of the area written where we produce unprotected RMW
sequences, such as for example:

ldbu $1,0($3)
stw $31,8($3)
stq $1,0($3)

to zero a 9-byte member at the byte offset of 1 of a quadword-aligned
struct, happily clobbering a 1-byte member at the beginning of said
struct if concurrent write happens while executing on the same CPU such
as in a signal handler or a parallel write happens while executing on
another CPU such as in another thread or via a shared memory segment.

To guard against these data races with partial memory write accesses
introduce the `-msafe-partial' command-line option that instructs the
compiler to protect boundaries of the data quantity accessed by instead
using a longer code sequence composed of narrower memory writes where
suitable machine instructions are available (i.e. with BWX targets) or
atomic RMW access sequences where byte and word memory access machine
instructions are not available (i.e. with non-BWX targets).

Owing to the desire of branch avoidance there are redundant overlapping
writes in unaligned cases where STQ_U operations are used in the middle
of a block so as to make sure no part of data to be written has been
lost regardless of run-time alignment.  For the non-BWX case it means
that with blocks whose size is not a multiple of 8 there are additional
atomic RMW sequences issued towards the end of the block in addition to
the always required pair enclosing the block from each end.

Only one such additional atomic RMW sequence is actually required, but
code currently issues two for the sake of simplicity.  An improvement
might be added to `alpha_expand_unaligned_store_words_safe_partial' in
the future, by folding `alpha_expand_unaligned_store_safe_partial' code
for handling multi-word blocks whose size is not a multiple of 8 (i.e.
with a trailing partial-word part).  It would improve performance a bit,
but current code is correct regardless.

Update test cases with `-mno-safe-partial' where required and add new
ones accordingly.

In some cases GCC chooses to open-code block memory write operations, so
with non-BWX targets `-msafe-partial' will in the usual case have to be
used together with `-msafe-bwa'.

Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for
the purpose of verifying the BWX side of this change.

gcc/
PR target/117759
* config/alpha/alpha-protos.h
(alpha_expand_unaligned_store_safe_partial): New prototype.
* config/alpha/alpha.cc (alpha_expand_movmisalign)
(alpha_expand_block_move, alpha_expand_block_clear): Handle
TARGET_SAFE_PARTIAL.
(alpha_expand_unaligned_store_safe_partial)
(alpha_expand_unaligned_store_words_safe_partial)
(alpha_expand_clear_safe_partial_nobwx): New functions.
* config/alpha/alpha.md (insvmisaligndi): Handle
TARGET_SAFE_PARTIAL.
* config/alpha/alpha.opt (msafe-partial): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.

gcc/testsuite/
PR target/117759
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add
`-mno-safe-partial'.
* gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stlx0-safe-partial.c: New file.
* gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stqx0-safe-partial.c: New file.
* gcc.target/alpha/stqx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'.  Refer
to stwx0.c rather than copying its code and also verify no LDQ_U
or STQ_U instructions have been produced.
* gcc.target/alpha/stwx0-safe-partial.c: New file.
* gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.

Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]

With non-BWX Alpha implementations we have a problem of data races where
a 8-bit byte or 16-bit word quantity is to be written to memory in that
in those cases we use an unprotected RMW access of a 32-bit longword or
64-bit quadword width.  If contents of the longword or quadword accessed
outside the byte or word to be written are changed midway through by a
concurrent write executing on the same CPU such as by a signal handler
or a parallel write executing on another CPU such as by another thread
or via a shared memory segment, then the concluding write of the RMW
access will clobber them.  This is especially important for the safety
of RCU algorithms, but is otherwise an issue anyway.

To guard against these data races with byte and aligned word quantities
introduce the `-msafe-bwa' command-line option (standing for Safe Byte &
Word Access) that instructs the compiler to instead use an atomic RMW
access sequence where byte and word memory access machine instructions
are not available.  There is no change to code produced for BWX targets.

It would be sufficient for the secondary reload handle to use a pair of
scratch registers, as requested by `reload_out<mode>', but it would end
with poor code produced as one of the scratches would be occupied by
data retrieved and the other one would have to be reloaded with repeated
calculations, all within the LL/SC sequence.

Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler
and ask for more scratches there by defining a 256-bit OI integer mode.
While reload is documented in our manual to support an arbitrary number
of scratches in reality it hasn't been implemented for IRA:

/* ??? It would be useful to be able to handle only two, or more than
   three, operands, but for now we can only handle the case of having
   exactly three: output, input and one temp/scratch.  */

and it seems to be the case for LRA as well.  Do what everyone else does
then and just have one wide multi-register scratch.

I note that the atomic sequences emitted are suboptimal performance-wise
as the looping branch for the unsuccessful completion of the sequence
points backwards, which means it will be predicted as taken despite that
in most cases it will fall through.  I do not see it as a deficiency of
this change proposed as it takes care of recording that the branch is
unlikely to be taken, by calling `alpha_emit_unlikely_jump'.  Therefore
generic code elsewhere should instead be investigated and adjusted
accordingly for the arrangement to actually take effect.

Add test cases accordingly.

There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-bwa' one:

FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O0  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O1  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -Os  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: g++.dg/init/array25.C  -std=c++17 execution test
FAIL: g++.dg/init/array25.C  -std=c++98 execution test
FAIL: g++.dg/init/array25.C  -std=c++26 execution test

They come from the fact that these test cases play tricks with alignment
and end up calling code that expects a reference to aligned data but is
handed one to unaligned data.

This doesn't cause a visible problem with plain `-mno-bwx' code, because
the resulting alignment exception is fixed up by Linux.  There's no such
handling currently implemented for LDL_L or LDQ_L instructions (which
are first in the sequence) and consequently the offender is issued with
SIGBUS instead.  Suitable handling will be added to Linux to complement
this change that will emulate the trapping instructions[1], so these
interim regressions are seen as harmless and expected.

References:

[1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency",
    <https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/>

gcc/
PR target/117759
* config/alpha/alpha-modes.def (OI): New integer mode.
* config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New
prototype.
* config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New
function.
(alpha_secondary_reload): Handle TARGET_SAFE_BWA.
* config/alpha/alpha.md (aligned_store_safe_bwa)
(unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa)
(reload_out<mode>_unaligned_safe_bwa): New expanders.
(mov<mode>, movcqi, reload_out<mode>_aligned): Handle
TARGET_SAFE_BWA.
(reload_out<mode>): Guard against TARGET_SAFE_BWA.
* config/alpha/alpha.opt (msafe-bwa): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.

gcc/testsuite/
PR target/117759
* gcc.target/alpha/stb.c: New file.
* gcc.target/alpha/stb-bwa.c: New file.
* gcc.target/alpha/stb-bwx.c: New file.
* gcc.target/alpha/stba.c: New file.
* gcc.target/alpha/stba-bwa.c: New file.
* gcc.target/alpha/stba-bwx.c: New file.
* gcc.target/alpha/stw.c: New file.
* gcc.target/alpha/stw-bwa.c: New file.
* gcc.target/alpha/stw-bwx.c: New file.
* gcc.target/alpha/stwa.c: New file.
* gcc.target/alpha/stwa-bwa.c: New file.
* gcc.target/alpha/stwa-bwx.c: New file.

IRA+LRA: Let the backend request to split basic blocks

The next change for Alpha will produce extra labels and branches in
reload, which in turn requires basic blocks to be split at completion.
We do this already for functions that can trap, so just extend the
arrangement with a flag for the backend to use whenever it finds it
necessary.

gcc/
* function.h (struct function): Add
`split_basic_blocks_after_reload' member.
* lra.cc (lra): Handle it.
* reload1.cc (reload): Likewise.

Alpha: Export `emit_unlikely_jump' for a subsequent change to use

Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so
as to avoid namespace pollution, updating callers accordingly and export
it for use in the machine description. Make it return the insn emitted.

gcc/
* config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New
prototype.
* config/alpha/alpha.cc (emit_unlikely_jump): Rename to...
(alpha_emit_unlikely_jump): ... this. Return the insn emitted.
(alpha_split_atomic_op, alpha_split_compare_and_swap)
(alpha_split_compare_and_swap_12, alpha_split_atomic_exchange)
(alpha_split_atomic_exchange_12): Update call sites accordingly.

gcc/testsuite/g++.dg/gomp/append-args-8.C: Fix scan-dump-tree

gcc/testsuite/ChangeLog:

* g++.dg/gomp/append-args-8.C: Remove bogus '3' after \.\[0-9\]+
pattern.