git.ipfire.org Git - thirdparty/gcc.git/log

Add a cache of recent lines

For larger files the file_cache line index will be spread out to make
the index fit into the fixed buffer, so any access to the non latest line
will need some skipping of lines.

Most accesses for line are near the latest line because
a diagnostic is likely near where the scanner is currently lexing.

Add a second cache for recent lines. It is organized as a ring buffer
and maintains the last 256 lines relative to the last input line.

With that, enabling -Wmisleading-indentation for the test case in
PR preprocessor/118168, is within the run-to-run variation.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache::m_line_recent,
m_line_recent_first, m_line_recent_last): Add.
(file_cache_slot::evict): Clear new fields.
(file_cache_slot::create): Clear new fields.
(file_cache_slot::file_cache_slot): Initialize new fields.
(file_cache_slot::~file_cache_slot): Release m_line_recent.
(file_cache_slot::get_next_line): Maintain ring buffer of lines
in m_line_recent.
(file_cache_slot::read_line_num): Use m_line_recent to look up
recent lines quickly.

arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]

For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry.  This saves
a couple of bytes in the resulting image.  This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.

Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.

Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.

Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.

Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note.  The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.

gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.

gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.

arm: fix ICE due to fix for POP {PC} change

My earlier change for making the compiler prefer

POP {PC}

over

LDR PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC. This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.

[rtl-optimization/116244] Don't create bogus regs in alter_subreg

> Jeff Law <jeffreyalaw@gmail.com> writes:
>> So pulling on this thread leads me into the code that sets up
>> ALLOCNO_WMODE in create_insn_allocnos:
>>
>>>            if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
>>>              {
>>>                a = ira_create_allocno (regno, false, ira_curr_loop_tree_node);
>>>                if (outer != NULL && GET_CODE (outer) == SUBREG)
>>>                  {
>>>                    machine_mode wmode = GET_MODE (outer);
>>>                    if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
>>>                      ALLOCNO_WMODE (a) = wmode;
>>>                  }
>>>              }
>> Note how we only set ALLOCNO_MODE only at allocno creation, so it'll
>> work as intended if and only if the first reference is via a SUBREG.
>
> Huh, yeah, I agree that that looks wrong.
>
>> ISTM the fix here is to always do the check and set ALLOCNO_WMODE.
>>[ Snipped discussion on a non-issue. ]

>
> So ISTM that moving the code out of the "if (... == NULL)" should be
> enough on its own.
>
>> And it all makes sense that you caught this.  You and another colleague
>> at ARM were trying to address this exact problem ~11 years ago ;-)
>
> Heh, thought it sounded familiar :)

So attached is the updated patch that adjusts IRA to avoid this problem.

Georg-Johann, this may explain an issue you were running into as well where you
got an invalid allocation.  I think yours was at the higher end of the register
file, but the core issue is potentially the same (looking at the first use
rather than all of them for paradoxical subregs).

I've had this in my tester about a week.  So it's been through the crosses as
well as various native bootstraps, including but not limited to m68k, ppc,
s390, hppa, sh4, etc.  And just for good measure I bootstrapped & regression
tested it on x86_64 a few minutes ago.

Pushing to the trunk.

PR rtl-optimization/116244
gcc/
* ira-build.cc (create_insn_allocnos): Do not restrict the check for
subreg uses to allocno creation time.  Do it for all uses.

gcc/testsuite/
* g++.target/m68k/m68k.exp: New test driver.
* g++.target/m68k/pr116244.C: New test.

c++: Fix up name independent decl in structured binding handling in range for [PR115586]

cp_parser_range_for temporarily reverts IDENTIFIER_BINDING changes
to hide the decls from the structured bindings from lookup during
parsing of the expression after :
If there are 2 or more name independent decls, we undo IDENTIFIER_BINDING
for the same name multiple times, even when just one has been added
(with a TREE_LIST inside of it as decl).

The following patch fixes it by handling the _ name at most once, the
later loop will DTRT then and just reinstall the temporarily hidden
binding with the TREE_LIST in there.

2025-02-07 Jakub Jelinek <jakub@redhat.com>

PR c++/115586
* parser.cc (cp_parser_range_for): For name independent decls in
structured bindings, only push the name/binding once per
structured binding.

* g++.dg/cpp26/name-independent-decl9.C: New test.
* g++.dg/cpp26/name-independent-decl10.C: New test.

c++: Fix up handling of for/while loops with declarations in condition [PR86769]

As the following testcase show (note, just for-{3,4,6,7,8}.C, constexpr-86769.C
and stmtexpr27.C FAIL without the patch, the rest is just that I couldn't
find coverage for some details and so added tests we don't regress or for5.C
is from Marek's attempt in the PR), we weren't correctly handling for/while
loops with declarations as conditions.

The C++ FE has the simplify_loop_decl_cond function which transforms
such loops as mentioned in the comment:
            while (A x = 42) { }
            for (; A x = 42;) { }
   becomes
            while (true) { A x = 42; if (!x) break; }
            for (;;) { A x = 42; if (!x) break; }
For for loops this is not enough, as the x declaration should be
still in scope when expr (if any) is executed, and injecting the
expr expression into the body in the FE needs to have the continue
label in between, something normally added by the c-family
genericization.  One of my thoughts was to just add there an artificial
label plus the expr expression in the FE and tell c-family about that
label, so that it doesn't create it but uses what has been emitted.

Unfortunately break/continue are resolved to labels only at c-family
genericization time and by moving the condition (and its preparation
statements such as the DECL_EXPR) into the body (and perhaps by also
moving there the (increment) expr as well) we resolve incorrectly any
break/continue statement appearing in cond (or newly perhaps also expr)
expression(s).  While in standard C++ one can't have something like that
there, with statement expressions they are possible there, and we actually
have testsuite coverage that when they appear outside of the body of the
loop they bind to an outer loop rather than the inner one.  When the FE
moves everything into the body, c-family can't distinguish any more between
the user body vs. the condition/preparation statements vs. expr expression.

So, the following patch instead keeps them separate and does the merging
only at the c-family loop genericization time.  For that the patch
introduces two new operands of FOR_STMT and WHILE_STMT, *_COND_PREP
which is forced to be a BIND_EXPR which contains the preparation statements
like DECL_EXPR, and the initialization of that variable, so basically what
{FOR,WHILE}_BODY has when we encounter the function dealing with this,
except one optional CLEANUP_STMT at the end which holds cleanups for the
variable if it needs to be destructed.  This CLEANUP_STMT is removed and
the actual cleanup saved into another new operand, *_COND_CLEANUP.

The c-family loop genericization handles such loops roughly the way
https://eel.is/c++draft/stmt.for and https://eel.is/c++draft/stmt.while
specifies, so the body is (if *_COND_CLEANUP is non-NULL)
{ A x = 42; try { if (!x) break; body; cont_label: expr; } finally { cleanup; } }
and otherwise
{ A x = 42; if (!x) break; body; cont_label: expr; }
i.e. the *_COND, *_BODY, optional continue label, FOR_EXPR  are appended
into the body of the *_COND_PREP BIND_EXPR.

And when doing constexpr evaluation of such FOR/WHILE loops, we treat
it similarly, first evaluate *_COND_PREP except the
      for (tree decl = BIND_EXPR_VARS (t); decl; decl = DECL_CHAIN (decl))
        destroy_value_checked (ctx, decl, non_constant_p);
part of BIND_EXPR handling for it, then evaluate *_COND (and decide based
on whether it was false or true like before), then *_BODY, then FOR_EXPR,
then *_COND_CLEANUP (similarly to the way how CLEANUP_STMT handling handles
that) and finally do those destroy_value_checked.

Note, the constexpr-86769.C testcase FAILs with both clang++ and MSVC (note,
the rest of tests PASS with clang++) but I believe it must be just a bug
in those compilers, new int is done in all the constructors and delete is
done in the destructor, so when clang++ reports one of the new int weren't
deallocated during constexpr evaluation I don't see how that would be
possible.  When the same test has all the constexpr stuff, all the new int
are properly deleted at runtime when compiled by both compilers and valgrind
is happy about it, no leaks.

2025-02-07  Jakub Jelinek  <jakub@redhat.com>
    Jason Merrill  <jason@redhat.com>

PR c++/86769
gcc/c-family/
* c-common.def (FOR_STMT): Add 2 operands and document them.
(WHILE_STMT): Likewise.
* c-common.h (WHILE_COND_PREP, WHILE_COND_CLEANUP): Define.
(FOR_COND_PREP, FOR_COND_CLEANUP): Define.
* c-gimplify.cc (genericize_c_loop): Add COND_PREP and COND_CLEANUP
arguments, handle them if they are non-NULL.
(genericize_for_stmt, genericize_while_stmt, genericize_do_stmt):
Adjust callers.
gcc/c/
* c-parser.cc (c_parser_while_statement): Add 2 further NULL_TREE
operands to build_stmt.
(c_parser_for_statement): Likewise.
gcc/cp/
* semantics.cc (set_one_cleanup_loc): New function.
(set_cleanup_locs): Use it.
(simplify_loop_decl_cond): Remove.
(adjust_loop_decl_cond): New function.
(begin_while_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_while_stmt_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_while_stmt): Call do_poplevel also on WHILE_COND_PREP if
non-NULL and also use pop_stmt_list rather than do_poplevel for
WHILE_BODY in that case.  Call set_one_cleanup_loc.
(begin_for_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_for_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_for_stmt): Call do_poplevel also on FOR_COND_PREP if non-NULL
and also use pop_stmt_list rather than do_poplevel for FOR_BODY in
that case.  Call set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Handle
{WHILE,FOR}_COND_{PREP,CLEANUP}.
(check_for_return_continue): Handle {WHILE,FOR}_COND_PREP.
(potential_constant_expression_1): RECUR on
{WHILE,FOR}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/diagnostic/redeclaration-7.C: New test.
* g++.dg/expr/for3.C: New test.
* g++.dg/expr/for4.C: New test.
* g++.dg/expr/for5.C: New test.
* g++.dg/expr/for6.C: New test.
* g++.dg/expr/for7.C: New test.
* g++.dg/expr/for8.C: New test.
* g++.dg/ext/stmtexpr27.C: New test.
* g++.dg/cpp2a/constexpr-86769.C: New test.
* g++.dg/cpp26/name-independent-decl7.C: New test.
* g++.dg/cpp26/name-independent-decl8.C: New test.

jit/118780 - make sure to include dlfcn.h when plugin support is disabled

The following makes the dlfcn.h explicitly requested which avoids
build failure when JIT is enabled but plugin support disabled as
currently the include is conditional on plugin support.

PR jit/118780
gcc/
* system.h: Check INCLUDE_DLFCN_H for including dlfcn.h instead
of ENABLE_PLUGIN.
* plugin.cc: Define INCLUDE_DLFCN_H.

gcc/jit/
* jit-playback.cc: Define INCLUDE_DLFCN_H.
* jit-result.cc: Likewise.

libstdc++: fix a dangling reference crash in ranges::is_permutation [PR118160]

The code was caching the result of `invoke(proj, *it)` in a local
`auto &&` variable. The problem is that this may create dangling
references, for instance in case `proj` is `std::identity` (the common
case) and `*it` produces a prvalue: lifetime extension does not
apply here due to the expressions involved.

Instead, store (and lifetime-extend) the result of `*it` in a separate
variable, then project that variable. While at it, also forward the
result of the projection to the predicate, so that the predicate can
act on the proper value category.

libstdc++-v3/ChangeLog:

PR libstdc++/118160
PR libstdc++/100249
* include/bits/ranges_algo.h (__is_permutation_fn): Avoid a
dangling reference by storing the result of the iterator
dereference and the result of the projection in two distinct
variables, in order to lifetime-extend each one.
Forward the projected value to the predicate.
* testsuite/25_algorithms/is_permutation/constrained.cc: Add a
test with a range returning prvalues. Test it in a constexpr
context, in order to rely on the compiler to catch UB.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>

libstdc++: Handle exceptions in std::ostream::sentry destructor

Because basic_ostream::sentry::~sentry is implicitly noexcept, we can't
let any exceptions escape from it, or the program would terminate. If
the streambuf's sync() function throws, or if it returns an error and
setting badbit in the stream state throws, then the program would
terminate.

LWG 835 intended to prevent exceptions from being thrown by the
std::basic_ostream::sentry destructor, but failed to cover the case
where the streambuf's sync() member throws an exception. LWG 4188 is
needed to fix that part. In any case, LWG 835 was never implemented for
libstdc++ so this does that, as well as my proposed fix for 4188 (that
badbit should be set if pubsync() exits via an exception).

In order to avoid a second try-catch block to handle an exception that
might be thrown by setting badbit, this introduces an RAII helper class
that temporarily clears the stream's exceptions mask, then restores it
afterwards.

The try-catch block doesn't handle the forced_unwind exception
explicitly, because catching and rethrowing that would just terminate
when it reached the sentry's implicit noexcept(true) anyway.

libstdc++-v3/ChangeLog:

* include/bits/ostream.h (basic_ostream::_Disable_exceptions):
RAII helper type.
(basic_ostream::sentry::~sentry): Use _Disable_exceptions. Add
try-catch block around call to pubsync.
* testsuite/27_io/basic_ostream/exceptions/char/lwg4188.cc: New
test.
* testsuite/27_io/basic_ostream/exceptions/wchar_t/lwg4188.cc:
New test.

libstdc++: Add comment about use of always_inline attributes [PR111050]

Add a comment referencing PR 111050, to ensure the fix made by
r12-9903-g1be57348229666 doesn't get reverted.

libstdc++-v3/ChangeLog:

PR libstdc++/111050
* include/bits/hashtable_policy.h (_Hash_node_value_base): Add
comment about always_inline attributes.

RISC-V: Make VXRM as global register [PR118103]

Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.

For example as below code:

  21   │
  22   │ void compute ()
  23   │ {
  24   │   size_t vl = __riscv_vsetvl_e16m1 (N);
  25   │   vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
  26   │   vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
  27   │   vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
  28   │
  29   │   __riscv_vse16_v_u16m1 (c, vc, vl);
  30   │ }
  31   │
  32   │ int main ()
  33   │ {
  34   │   initialize ();
  35   │   compute();
  36   │
  37   │   return 0;
  38   │ }

After compile with -march=rv64gcv -O3, we will have:

  30   │ compute:
  31   │     csrwi   vxrm,2
  32   │     lui a3,%hi(a)
  33   │     lui a4,%hi(b)
  34   │     addi    a4,a4,%lo(b)
  35   │     vsetivli    zero,4,e16,m1,ta,ma
  36   │     addi    a3,a3,%lo(a)
  37   │     vle16.v v2,0(a4)
  38   │     vle16.v v1,0(a3)
  39   │     lui a4,%hi(c)
  40   │     addi    a4,a4,%lo(c)
  41   │     vaaddu.vv   v1,v1,v2
  42   │     vse16.v v1,0(a4)
  43   │     ret
  44   │     .size   compute, .-compute
  45   │     .section    .text.startup,"ax",@progbits
  46   │     .align  1
  47   │     .globl  main
  48   │     .type   main, @function
  49   │ main:
       |     // csrwi   vxrm,2 deleted after inline
  50   │     addi    sp,sp,-16
  51   │     sd  ra,8(sp)
  52   │     call    initialize
  53   │     lui a3,%hi(a)
  54   │     lui a4,%hi(b)
  55   │     vsetivli    zero,4,e16,m1,ta,ma
  56   │     addi    a4,a4,%lo(b)
  57   │     addi    a3,a3,%lo(a)
  58   │     vle16.v v2,0(a4)
  59   │     vle16.v v1,0(a3)
  60   │     lui a4,%hi(c)
  61   │     addi    a4,a4,%lo(c)
  62   │     li  a0,0
  63   │     vaaddu.vv   v1,v1,v2

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/118103

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

[testsuite] tolerate later success [PR108357]

On leon3-elf and presumably on other targets, the test fails due to
differences in calling conventions and other reasons, that add extra
gimple stmts that prevent the expected optimization at the expected
point. The optimization takes place anyway, just a little later, so
tolerate that.

for gcc/testsuite/ChangeLog

PR tree-optimization/108357
* gcc.dg/tree-ssa/pr108357.c: Tolerate later optimization.

aarch64: Fix bootstrap with --enable-checking=release [PR118771]

With release checking we get an uninitialization warning
inside aarch64_split_move because of jump threading for the case of `npieces==0`
but `npieces` is never 0 (but there is no way the compiler can know that.
So this fixes the issue by adding a `gcc_assert` to the function which asserts
that `npieces > 0` and fixes the uninitialization warning.

Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release).

The warning:

aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)':
aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized]
3418 |   if (reg_overlap_mentioned_p (dst_pieces[0], src))
      |       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
aarch64.cc:3408:20: note: 'dst_pieces' declared here
3408 |   auto_vec<rtx, 4> dst_pieces, src_pieces;
      |                    ^~~~~~~~~~

PR target/118771
gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is
greater than 0.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Honor dump options for C/C++ '-fdump-tree-original'

In addition to upcoming use of '-fdump-tree-original-lineno', this patch
actually resolves XFAILs for 'c-c++-common/goacc/pr92793-1.c', which had
gotten added as part of commit fa410314ec94c9df2ad270c1917adc51f9147c2c
"[OpenACC] Elaborate testcases that verify column location information [PR92793]".

gcc/c-family/
* c-gimplify.cc (c_genericize): Pass 'local_dump_flags' to
'print_c_tree'.
* c-pretty-print.cc (c_pretty_printer::statement): Pass
'dump_flags' to 'dump_generic_node'.
(c_pretty_printer::c_pretty_printer): Initialize 'dump_flags'.
(print_c_tree): Add 'dump_flags_t' formal parameter.
(debug_c_tree): Adjust.
* c-pretty-print.h (c_pretty_printer): Add 'dump_flags_t
dump_flags'.
(c_pretty_printer::c_pretty_printer): Add 'dump_flags_t' formal
parameter.
(print_c_tree): Adjust.
gcc/testsuite/
* c-c++-common/goacc/pr92793-1.c: Remove
'-fdump-tree-original-lineno' XFAILs.

c++: ICE with unparsed noexcept [PR117106]

In a member-specification of a class, a noexcept-specifier is
a complete-class context.  Thus we delay parsing until the end of
the class via our DEFERRED_PARSE mechanism; see cp_parser_save_noexcept
and cp_parser_late_noexcept_specifier.

We also attempt to defer instantiation of noexcept-specifiers in order
to reduce the number of instantiations; this is done via DEFERRED_NOEXCEPT.

We can even have both, as in noexcept65.C: a DEFERRED_PARSE wrapped in
DEFERRED_NOEXCEPT, which uses the DEFPARSE_INSTANTIATIONS mechanism.
noexcept65.C works, because when we really need the noexcept, which is
when parsing the body of S::A::A(), the noexcept will have been parsed
already; noexcepts are parsed before bodies of member function.

But in this test we have:

  struct A {
      int x;
      template<class>
      void foo() noexcept(noexcept(x)) {}
      auto bar() -> decltype(foo<int>()) {} // #1
  };

and I think the decltype in #1 needs the unparsed noexcept before it
could have been parsed.  clang++ rejects the test and I suppose we
should reject it as well, rather than crashing on a DEFERRED_PARSE
in tsubst_expr.

PR c++/117106
PR c++/118190

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Give an error if the noexcept
hasn't been parsed yet.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept89.C: New test.
* g++.dg/cpp0x/noexcept90.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Properly support null pointer constants in conditional operators [PR118282]

We've been rejecting the following valid code since GCC 4

=== cut here ===
struct A {
  explicit A (int);
  operator void* () const;
};
void foo (const A& x) {
  auto res = 0 ? x : 0;
}
int main () {
  A a{5};
  foo(a);
}
=== cut here ===

The problem is that for COND_EXPR, add_builtin_candidate has an early
return if the true and false values are not pointers that does not take
null pointer constants into account. This causes to not find any valid
conversion, and fail to compile.

This patch fixes the condition to also pass if the true/false values are
not pointers but null pointer constants, which resolves the PR.

PR c++/118282

gcc/cp/ChangeLog:

* call.cc (add_builtin_candidate): Also check for null_ptr_cst_p
operands.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/op8.C: New test.

c++: Don't use CLEANUP_EH_ONLY for new expression cleanup [PR118763]

The following testcase is miscompiled since r12-6325 stopped
preevaluating the initializers for new expression.
If evaluating the initializers throws, there is a correct cleanup
for that, but it is marked CLEANUP_EH_ONLY.  While in standard
C++ that is just fine, if it has statement expressions, it can
return or goto out of the expression and we should delete the
pointer in that case too.

There is already a sentry variable initialized to true and
set to false after everything is initialized and used as a guard
for the cleanup, so just removing the CLEANUP_EH_ONLY flag does
everything we need.  And in the normal case of the initializer
not using statement expressions at least with -O2 we get the same code,
while the change changes one
try { sentry = true; ... sentry = false; } catch { if (sentry) delete ...; }
into
try { sentry = true; ... sentry = false; } finally { if (sentry) delete ...; }
optimizations will see that sentry is false when reaching the finally
other than through an exception.

Though, wonder what other CLEANUP_EH_ONLY cleanups might be an issue
with statement expressions.

2025-02-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/118763
* init.cc (build_new_1): Don't set CLEANUP_EH_ONLY.

* g++.dg/asan/pr118763.C: New test.

c++: Use cplus_decl_attributes rather than decl_attributes in grokdecl [PR118773]

My r15-3046 change regressed the first half of the following testcase.
When it calls decl_attributes, it doesn't handle attributes with
dependent arguments correctly and so is now rejected that N is not
a constant integer during template parsing.

I've actually followed the pointer/reference case which did that
too and that one has been failing for a couple of years on the
second part of the testcase.

Note, there is also
          if (decl_context != PARM && decl_context != TYPENAME)
            /* Assume that any attributes that get applied late to
               templates will DTRT when applied to the declaration
               as a whole.  */
            late_attrs = splice_template_attributes (&attrs, type);
          returned_attrs = decl_attributes (&type,
                                            attr_chainon (returned_attrs,
                                                          attrs),
                                            attr_flags);
          returned_attrs = attr_chainon (late_attrs, returned_attrs);
call directly to decl_attributes in grokdeclarator, but this one handles
the splicing manually, so maybe it is ok as is (and I don't have a testcase
of anything misbehaving for that).

2025-02-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/118773
* decl.cc (grokdeclarator): Use cplus_decl_attributes rather than
decl_attributes for std_attributes on pointer and array types.

* g++.dg/cpp0x/gen-attrs-87.C: New test.
* g++.dg/gomp/attrs-3.C: Adjust expected diagnostics.

c++: Allow constexpr reads from volatile std::nullptr_t objects [PR118661]

As mentioned in the PR, https://eel.is/c++draft/conv.lval#note-1
says that even volatile reads from std::nullptr_t typed objects actually
don't read anything and https://eel.is/c++draft/expr.const#10.9
says that even those are ok in constant expressions.

So, the following patch adjusts the r9-4793 changes to have an exception
for NULLPTR_TYPE.
As [conv.lval]/3 also talks about accessing to inactive member, I've added
testcase to cover that as well.

2025-02-07 Jakub Jelinek <jakub@redhat.com>

PR c++/118661
* constexpr.cc (potential_constant_expression_1): Don't diagnose
lvalue-to-rvalue conversion of volatile lvalue if it has NULLPTR_TYPE.
* decl2.cc (decl_maybe_constant_var_p): Return true for constexpr
decls with NULLPTR_TYPE even if they are volatile.

* g++.dg/cpp0x/constexpr-volatile4.C: New test.
* g++.dg/cpp0x/constexpr-union9.C: New test.

Fortran: Fix default init of finalizable derived argus [PR116829]

2025-02-07 Tomáš Trnka <trnka@scm.com>

gcc/fortran
PR fortran/116829
* trans-decl.cc (init_intent_out_dt): Always call
gfc_init_default_dt() for BT_DERIVED to apply s->value if the
symbol isn't allocatable. Also simplify the logic a bit.

gcc/testsuite/
PR fortran/116829
* gfortran.dg/derived_init_7.f90: New test.

tree-optimization/115538 - possible wrong-code with SLP conversion

The following fixes a latent issue where we use ranges to verify
correctness of a vector conversion optimization. We rely on ranges
from 'op0' which for SLP is extracted from the representative stmt
which does not necessarily correspond to any actual scalar operation.
We also do not verify the range of all scalar lanes in the SLP
operand match. The following rectifies this, restricting the support
to single-lane SLP nodes at this point - on branches we'd simply
not perform this optimization with SLP.

PR tree-optimization/115538
* tree-vectorizer.h (vect_get_slp_scalar_def): Declare.
* tree-vect-slp.cc (vect_get_slp_scalar_def): New helper.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): For SLP
correctly look at ranges of the scalar defs of the SLP operand.
(supportable_indirect_convert_operation): Likewise.

[gcn] Fix the output amdhsa.version

The amdhsa.version depends on the code object version; while V3 had 1.0,
V4 has 1.1 and V5 (and V6) have 1.2. GCC used 1.0 but generated since
a while either V4 or, with -march=gfx...-generic, V6. Now it uses the
proper version again.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Update
'amdhsa.version' output to match used code version.
* config/gcn/gen-gcn-device-macros.awk: Add a comment to
crosslink.

[GCN] Handle generic ISA names in libgomp's plugin-gcn.c

libgomp/ChangeLog:

* plugin/plugin-gcn.c (ELFABIVERSION_AMDGPU_HSA_V6,
EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET,
GET_GENERIC_VERSION): New #define.
(elf_gcn_isa_is_generic): New.
(isa_matches_agent): Accept all generic code objects on the first
go; extend the diagnostic and handle runtime-failed case.
(create_and_finalize_hsa_program): Call it also after loading
the code failed, pass the status.

LoongArch: Correct the mode for mask{eq,ne}z

For mask{eq,ne}z, rk is always compared with 0 in the full width, thus
the mode for rk should be X.

I found the issue reviewing a patch fixing a similar issue for RISC-V
XTheadCondMov [1], but interestingly I cannot find a test case really
blowing up on LoongArch. But as the issue is obvious enough let's fix
it anyway so it won't blow up in the future.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674004.html

gcc/ChangeLog:

* config/loongarch/loongarch.md
(*sel<code><GPR:mode>_using_<GPR2:mode>): Rename to ...
(*sel<code><GPR:mode>_using_<X:mode>): ... here.
(GPR2): Remove as nothing uses it now.

[ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

If decode_field_reference finds a load that accesses past the inner
object's size, bail out.

Drop the too-strict assert.

for  gcc/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gimple-fold.cc (decode_field_reference): Refuse to consider
merging out-of-bounds BIT_FIELD_REFs.
(make_bit_field_load): Drop too-strict assert.
* tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
(access_in_bounds_of_type_p): ... this.  Change interface,
export.
(tree_could_trap_p): Adjust.
* tree-eh.h (access_in_bounds_of_type_p): Declare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118514
PR tree-optimization/118706
* gcc.dg/field-merge-25.c: New.

[gcn] Add gfx9-generic and generic-associated gfx*

This patch adds gfx9-generic, completing the gfx*-generic support.
It also adds all gfx* devices that are part of any of the gfx*-generic,
i.e. gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.

gcc/ChangeLog:

* config/gcn/gcn-devices.def (GCN_DEVICE): Add gfx9-generic,
gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.
Add a currently unused column linking, a specific ISA to a generic
one (if it exists).
* config/gcn/gcn-tables.opt: Regenerate
* doc/invoke.texi (AMD GCN): Add the the new gfc... and the older
gfx{10-3,11}-generic to -march= as 'experimental'.

[gcn] Fix gfx906's sramecc setting

When compiling with -g, mkoffload.cc creates a device object file itself;
however, in order that the linker dos not complain, the ELF flags must
match what the compiler / linker does. For gfx906, the assembler defaults
to sramecc = any, but gcn-devices.def contained unsupported, which is not
the same - causing link errors. That's a regression caused by commit
r15-4540-ga6b26e5ea09779 - which can be best seen by looking at the
changes to mkoffload.cc.

Additionally, this commit adds '...' to the GCN_DEVICE #define in gcn.cc
to make it agnostic to the addition of fields.

gcc/ChangeLog:

* config/gcn/gcn-devices.def (GCN_DEVICE): Change sramecc for
gfx906 to 'any'.
* config/gcn/gcn.cc (GCN_DEVICE): Add tailing ... to #define.

[testsuite] [sparc] select ultrasparc for fsmuld test

vis3move-3.c expects fsmuld, that is not available on all variants of
sparc.  Select a cpu that supports it for the test.

Now, -mfix-ut699 irrevocbly disables fsmuld, so skip the test if the
test configuration uses that option.

for  gcc/testsuite/ChangeLog

* gcc.target/sparc/vis3move-3.c: Select ultrasparc.  Skip with
-mfix-ut699.

[testsuite] [sparc] skip tls tests if emulated

A number of tls tests expect TLS-specific relocations, that are not
present when tls is emulated, as on e.g. leon3-elf. Skip the tests
when tls is emulated.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/tls-ld-int16.c: Skip when tls is emulated.
* gcc.target/sparc/tls-ld-int32.c: Likewise.
* gcc.target/sparc/tls-ld-int8.c: Likewise.
* gcc.target/sparc/tls-ld-uint16.c: Likewise.
* gcc.target/sparc/tls-ld-uint32.c: Likewise.
* gcc.target/sparc/tls-ld-uint8.c: Likewise.

[testsuite] [sparc] skip sparc-ret-1 with -mfix-ut699

Option -mfix-ut699 changes the set of instructions that can be placed
in the delay slot, preventing the expected insn placement. Skip the
test if the option is present.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/sparc-ret-1.c: Skip on -mfix-ut699.

[testsuite] [sparc] use -mtune in alignment tuning test

If -mcpu=leon3 is present in the command line for a test run,
overriding it with -mcpu=niagara7 is not enough to override the tuning
for leon3 selected by the previous -mcpu option.

niagara7-align.c tests for niagara7 alignment tuning, so use -mtune
rather than -mcpu.

for gcc/testsuite/ChangeLog

* gcc.target/sparc/niagara7-align.c: Use -mtune.

ira: Add a target hook for callee-saved register cost scale

commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
Date:   Tue Jun 25 08:37:49 2024 -0500

    ira: Scale save/restore costs of callee save registers with block frequency

scales the cost of saving/restoring a callee-save hard register in epilogue
and prologue with the entry block frequency, which, if not optimizing for
size, is 10000, for all targets.  As the result, callee-saved registers
may not be used to preserve local variable values across calls on some
targets, like x86.  Add a target hook for the callee-saved register cost
scale in epilogue and prologue used by IRA.  The default version of this
target hook returns 1 if optimizing for size, otherwise returns the entry
block frequency.  Add an x86 version of this target hook to restore the
old behavior prior to the above commit.

PR rtl-optimization/111673
PR rtl-optimization/115932
PR rtl-optimization/116028
PR rtl-optimization/117081
PR rtl-optimization/117082
PR rtl-optimization/118497
* ira-color.cc (assign_hard_reg): Call the target hook for the
callee-saved register cost scale in epilogue and prologue.
* target.def (ira_callee_saved_register_cost_scale): New target
hook.
* targhooks.cc (default_ira_callee_saved_register_cost_scale):
New.
* targhooks.h (default_ira_callee_saved_register_cost_scale):
Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
New.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
New.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

[PATCH] RISC-V: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST to correct enum

stack_protect_{set,test}_<mode> were showing up in RTL dumps as
UNSPEC_COPYSIGN and UNSPEC_FMV_X_W due to UNSPEC_SSP_SET and
UNSPEC_SSP_TEST being put in the unspecv enum instead of unspec.

gcc/ChangeLog:

* config/riscv/riscv.md: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST
to unspec enum.

[RISC-V] Fix risc-v expected test output after recent iv changes

Richard S's recent change to iv increment insertion removed a reg->reg move
(which was its intent AFAICT).  This triggered a failure on a riscv test.

That test was meant to verify that we didn't have an extraneous reg->reg move
due to a buglet in the risc-v splitters.  Before the 2023 change we had two
vector reg->reg moves and after the 2023 fix we had just one.  With Richard's
change we have none ;-)  Adjusting test accordingly.

Pushed to the trunk.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/madd-split2-1.c: Update expected
output.

avr.opt.urls += -mcvt

gcc/
* config/avr/avr.opt.urls: Add mcvt.

middle-end: Remove unused internal function after IVopts cleanup [PR118756]

It seems that after my IVopts patches the function contain_complex_addr_expr
became unused and clang is rightfully complaining about it.

This removes the unused internal function.

gcc/ChangeLog:

PR tree-optimization/118756
* tree-ssa-loop-ivopts.cc (contain_complex_addr_expr): Remove.

Fortran: Fix handling of the X edit descriptor.

This patch is a partial fix of handling of X edit descriptors
when combined with certain T edit descriptors.

PR libfortran/114618

libgfortran/ChangeLog:

* io/transfer.c (formatted_transfer_scalar_write): Change name
of vriable 'pos' to 'tab_pos' to improve clarity. Add new
variable next_pos when calculating the maximum position.
Update the calculation of pending spaces.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114618.f90: New test.

c++: Add no_unique_address attribute further test coverage [PR110345]

Another non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-no_unique_address1.C: New test.

c++: Add noreturn attribute further test coverage [PR110345]

Another non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-noreturn1.C: New test.

c++: Add nodiscard attribute further test coverage [PR110345]

Fairly non-problematic attribute.

2025-02-06 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-nodiscard1.C: New test.

AVR: Add support for a Compact Vector Table (-mcvt).

Some AVR devices support a CVT:

- Devices from the 0-series, 1-series, 2-series.
- AVR16, AVR32, AVR64, AVR128 devices.

The support is provided by means of a startup code file
crt<mcu>-cvt.o from AVR-LibC v2.3 that can be linked instead
of the traditional crt<mcu>.o.

This patch adds a new command line option -mcvt that links
that CVT startup code (or issues an error when the device
doesn't support a CVT).

PR target/118764
gcc/
* config/avr/avr.opt (-mcvt): New target option.
* config/avr/avr-arch.h (AVR_CVT): New enum value.
* config/avr/avr-mcus.def: Add AVR_CVT flag for devices that
support it.
* config/avr/avr.cc (avr_handle_isr_attribute) [TARGET_CVT]: Issue
an error when a vector number larger that 3 is used.
* config/avr/gen-avr-mmcu-specs.cc (McuInfo.have_cvt): New property.
(print_mcu) <*avrlibc_startfile>: Use crt<mcu>-cvt.o depending
on -mcvt (or issue an error when the device doesn't support a CVT).
* doc/invoke.texi (AVR Options): Document -mcvt.

Fortran: FIx ICE in associate with elemental function [PR118750]

2025-02-06 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/118750
* resolve.cc (resolve_assoc_var): If the target expression has
a rank, do not use gfc_expression_rank, since it will return 0
if the function is elemental. Resolution will have produced the
correct rank.

gcc/testsuite/
PR fortran/118750
* gfortran.dg/associate_72.f90: New test.

loop-iv, riscv: Fix get_biv_step_1 for RISC-V [PR117506]

The following test ICEs on RISC-V at least latently since
r14-1622-g99bfdb072e67fa3fe294d86b4b2a9f686f8d9705 which added
RISC-V specific case to get_biv_step_1 to recognize also
({zero,sign}_extend:DI (plus:SI op0 op1))

The reason for the ICE is that op1 in this case is CONST_POLY_INT
which unlike the really expected VOIDmode CONST_INTs has its own
mode and still satisfies CONSTANT_P.
GET_MODE (rhs) (SImode) is different from outer_mode (DImode), so
the function later does
        *inner_step = simplify_gen_binary (code, outer_mode,
                                           *inner_step, op1);
but that obviously ICEs because while *inner_step is either VOIDmode
or DImode, op1 has SImode.

The following patch fixes it by extending op1 using code so that
simplify_gen_binary can handle it.  Another option would be
to change the !CONSTANT_P (op1) 3 lines above this to
!CONST_INT_P (op1), I think it isn't very likely that we get something
useful from other constants there.

2025-02-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117506
* loop-iv.cc (get_biv_step_1): For {ZERO,SIGN}_EXTEND
of PLUS apply {ZERO,SIGN}_EXTEND to op1.

* gcc.dg/pr117506.c: New test.
* gcc.target/riscv/pr117506.c: New test.

AVR: genmultilib.awk - Use more robust parsing of spaces.

gcc/
PR target/118768
* config/avr/genmultilib.awk: Parse the AVR_MCU lines in
a more robust way w.r.t. white spaces.

LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].

PR target/118561

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc
(loongarch_expand_builtin_lsx_test_branch):
NULL_RTX will not be returned when an error is detected.
(loongarch_expand_builtin): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118561.c: New test.

vect: Move induction IV increments [PR110449]

In this PR, we used to generate:

     .L6:
  mov     v30.16b, v31.16b
  fadd    v31.4s, v31.4s, v27.4s
  fadd    v29.4s, v30.4s, v28.4s
  stp     q30, q29, [x0]
  add     x0, x0, 32
  cmp     x1, x0
  bne     .L6

for an unrolled induction in:

  for (int i = 0; i < 1024; i++)
    {
      arr[i] = freq;
      freq += step;
    }

with the problem being the unnecessary MOV.

The main induction IV was incremented by VF * step == 2 * nunits * step,
and then nunits * step was added for the second store to arr.

The original patch for the PR (r14-2367-g224fd59b2dc8) avoided the MOV
by incrementing the IV by nunits * step twice.  The problem with that
approach is that it doubles the loop-carried latency.  This change was
deliberately not preserved when moving from loop-vect to SLP and so
the test started failing again after r15-3509-gd34cda720988.

I think the main problem is that we put the IV increment in the wrong
place.  Normal IVs created by create_iv are placed before the exit
condition where possible, but vectorizable_induction instead always
inserted them at the start of the loop body.  The only use of the
incremented IV is by the phi node, so the effect is to make both
the old and new IV values live for the whole loop body, which is
why we need the MOV.

The simplest fix therefore seems to be to reuse the create_iv logic.

gcc/
PR tree-optimization/110449
* tree-ssa-loop-manip.h (insert_iv_increment): Declare.
* tree-ssa-loop-manip.cc (insert_iv_increment): New function,
split out from...
(create_iv): ...here and generalized to gimple_seqs.
* tree-vect-loop.cc (vectorizable_induction): Use
standard_iv_increment_position and insert_iv_increment
to insert the IV increment.

gcc/testsuite/
PR tree-optimization/110449
* gcc.target/aarch64/pr110449.c: Expect an increment by 8.0,
but test that there is no MOV.

rtl-optimization/117922 - disable fold-mem-offsets for highly connected CFG

The PR shows fold-mem-offsets taking ages and a lot of memory computing
DU/UD chains as that requires the RD problem. The issue is not so much
the memory required for the pruned sets but the high CFG connectivity
(and that the CFG is cyclic) which makes solving the dataflow problem
expensive.

The following adds the same limit as the one imposed by GCSE and CPROP.

PR rtl-optimization/117922
* fold-mem-offsets.cc (pass_fold_mem_offsets::execute):
Do nothing for a highly connected CFG.

tree-optimization/118749 - bogus alignment peeling causes misaligned access

The vectorizer thinks it can align a vector access to 16 bytes when
using a vectorization factor of 8 and 1 byte elements. That of
course does not work for the 2nd vector iteration. Apparently we
lack a guard against such nonsense.

PR tree-optimization/118749
* tree-vect-data-refs.cc (vector_alignment_reachable_p): Pass
in the vectorization factor, when that cannot maintain
the DRs target alignment do not claim we can reach that
by peeling.

* gcc.dg/vect/pr118749.c: New testcase.

Daily bump.

[committed] Disable ABS instruction on bfin port

I was looking at a regression on the bfin port with a recent change to the IRA
and stumbled across this just doing a general port healthyness evaluation.

The ABS instruction in the blackfin ISA is defined as saturating on INT_MIN,
which is a bit unexpected. We certainly can't use it when -fwrapv is enabled.
Given the failures on the C23 uabs tests, I'm inclined to just disable the
pattern completely.

Fixes pr23047, uabs-2 and uabs-3.

While it's not a regression, it's the blackfin port, so I think we've got a
higher degree of freedom here.

Pushing to the trunk.

gcc/
* config/bfin/bfin.md (abssi): Disable pattern.

c++: Reject default arguments for template class friend functions [PR118319]

We segfault upon the following invalid code

=== cut here ===
template <int> struct S {
  friend void foo (int a = []{}());
};
void foo (int a) {}
int main () {
  S<0> t;
  foo ();
}
=== cut here ===

The problem is that we end up with a LAMBDA_EXPR callee in
set_flags_from_callee, and dereference its NULL_TREE
TREE_TYPE (TREE_TYPE (..)).

This patch sets the default argument to error_mark_node and gives a hard
error for template class friend functions that do not meet the
requirement in C++17 11.3.6/4 (the change is restricted to templates per
discussion with Jason).

PR c++/118319

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Inspect all friend function parameters.
If it's not valid for them to have a default value and we're
processing a template, set the default value to error_mark_node
and give a hard error.

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg18.C: New test.
* g++.dg/parse/defarg18a.C: New test.

[PR115568][LRA]: Use more strict output reload check in rematerialization

  In this PR case LRA rematerialized a value from inheritance insn
instead of output reload one.  This resulted in considering a
rematerilization candidate value available when it was actually
not.  As a consequence an insn after rematerliazation used the
unexpected value and this use resulted in fp exception.  The patch
fixes this bug.

gcc/ChangeLog:

PR rtl-optimization/115568
* lra-remat.cc (create_cands): Check that output reload insn is
adjacent to given insn.  Update a comment.

gcc/testsuite/ChangeLog:

PR rtl-optimization/115568
* gcc.target/i386/pr115568.c: New.

go: update builtin function attributes

PR go/118746
* go-gcc.cc (class Gcc_backend): Define builtin_cold,
builtin_leaf, builtin_nonnull. Alphabetize constants.
(Gcc_backend::Gcc_backend): Update attributes for builtin
functions to match builtins.def.
(Gcc_backend::define_builtin): Split out attribute setting into
set_attribtues.
(Gcc_backend::set_attribtues): New method split out of
define_builtin. Support new flag values.

aarch64: Fix sve/acle/general/ldff1_8.c failures

gcc.target/aarch64/sve/acle/general/ldff1_8.c and
gcc.target/aarch64/sve/ptest_1.c were failing because the
aarch64 port was giving a zero (unknown) cost to instructions
that compute two results in parallel.  This was latent until
r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs
as unknown.

A long-standing todo here is to make insn_cost derive costs from md
information, rather than having to write a lot of matching code in
aarch64_rtx_costs.  But that's not something we can do for GCC 15.

This patch instead treats the cost of a PARALLEL as being the maximum
cost of its constituent sets.  I don't like this very much, since it
isn't really target-specific behaviour.  If it were stage 1, I'd be
trying to change pattern_cost instead.

gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs
the same cost as the costliest SET.

Fortran/OpenMP: Add location data to 'sorry' [PR118740]

PR fortran/118740

gcc/fortran/ChangeLog:

* openmp.cc (gfc_match_omp_context_selector, match_omp_metadirective):
Change sorry to sorry_at and use gfc_current_locus as location.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise, but use n->where.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/append_args-2.f90: Update for line change.

cselib: Fix up previous patch for SPARC [PR117239]

Sorry, our CI bot just notified me I broke SPARC build. There are two
#ifdef STACK_ADDRESS_OFFSET
guarded snippets and the macro is only defined on SPARC target, so I didn't
notice there was a syntax error.

Fixed thusly.

2025-02-05 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/117239
* cselib.cc (cselib_init): Remove spurious closing paren in
the #ifdef STACK_ADDRESS_OFFSET specific code.

cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]

The following testcase is miscompiled on x86_64 during postreload.
After reload (with IPA-RA figuring out the calls don't modify any
registers but %rax for return value) postreload sees
(insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [0  S8 A64])
        (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal}
     (nil))
(call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("baz") [flags 0x3]  <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
            (const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3]  <function_decl 0x7ffb2e2bdf00 baz>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (nil))
(insn 16 15 18 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int 24 [0x18])))
            (clobber (reg:CC 17 flags))
        ]) "pr117239.c":18:7 285 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))
...
(call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3]  <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
            (const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3]  <function_decl 0x7ffb2e2bdb00 foo>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (nil))
(insn 21 19 26 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int -24 [0xffffffffffffffe8])))
            (clobber (reg:CC 17 flags))
        ]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
        (nil)))
(insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [0  S8 A64])
        (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal}
     (nil))
i.e.
        movq    %rdx, 16(%rsp)
        call    baz
        addq    $24, %rsp
...
        call    foo
        subq    $24, %rsp
        movq    %rdx, 16(%rsp)
Now, postreload uses cselib and cselib remembered that %rdx value has been
stored into 16(%rsp).  Both baz and foo are pure calls.  If they weren't,
when processing those CALL_INSNs cselib would invalidate all MEMs
      if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
          || !(RTL_CONST_OR_PURE_CALL_P (insn)))
        cselib_invalidate_mem (callmem);
where callmem is (mem:BLK (scratch)).  But they are pure, so instead the
code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
The calls actually clobber more than that, even const/pure calls clobber
all memory below the stack pointer.  And that is something that hasn't been
invalidated.  In this failing testcase, the call to baz is not a big deal,
we don't have anything remembered in memory below %rsp at that call.
But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
and do the call to foo.  And that call now actually, not just in theory,
clobbers the memory below the stack pointer (in particular overwrites it
with the return value).  But cselib does not invalidate.  Then %rsp
is decremented again (in preparation for another call, to bar) and cselib
is processing store of %rdx (which IPA-RA says has not been modified by
either baz or foo calls) to %rsp + 16, and it sees the memory already has
that value, so the store is useless, let's remove it.
But it is not, the call to foo has changed it, so it needs to be stored
again.

The following patch adds targetted invalidation of memory below stack
pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
used, or on PA memory above stack pointer instead).
It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions,
because in other functions the stack pointer should be constant from
the end of prologue till start of epilogue and so nothing should be stored
within the function below the stack pointer.

Now, memory below stack pointer is special, except for functions using
alloca/VLAs I believe no addressable memory should be there, it should be
purely outgoing function argument area, if we take address of some automatic
variable, it should live all the time above the outgoing function argument
area.  So on top of just trying to flush memory below stack pointer
(represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
the patch tries to optimize and only invalidate memory that has address
clearly derived from stack pointer (memory with other bases is not
invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
both VALUEs) it is above current stack, also don't call
canon_anti_dependence which might just give up in certain cases.

I've gathered statistics from x86_64-linux and i686-linux
bootstraps/regtests.  During -m64 compilations from those, there were
3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
(callmem[1]) calls, the first number is number of MEMs not invalidated
because of the optimization, i.e.
+             if (sp_derived_base == NULL_RTX)
+               {
+                 has_mem = true;
+                 num_mems++;
+                 p = &(*p)->next;
+                 continue;
+               }
in the patch, the second number is number of MEMs not invalidated because
canon_anti_dependence returned false and finally the last number is number
of MEMs actually invalidated (so that is what hasn't been invalidated
before).  During -m32 compilations the numbers were
1422412 + 39354 + 16509 with the same meaning.

Note, when there is no red zone, in theory even the sp = sp + incr
instruction invalidates memory below the new stack pointer, as signal
can come and overwrite the memory.  So maybe we should be invalidating
something at those instructions as well.  But in leaf functions we certainly
can have even addressable automatic vars in the red zone (which would make
it harder to distinguish), on the other side aren't normally storing
anything below the red zone, and in non-leaf it should normally be just the
outgoing arguments area.

2025-02-05  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117239
* cselib.cc: Include predict.h.
(callmem): Change type from rtx to rtx[2].
(cselib_preserve_only_values): Use callmem[0] rather than callmem.
(cselib_invalidate_mem): Optimize and don't try to invalidate
for the mem_rtx == callmem[1] case MEMs which clearly can't be
below the stack pointer.
(cselib_process_insn): Use callmem[0] rather than callmem.
For const/pure calls also call cselib_invalidate_mem (callmem[1])
in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions.
(cselib_init): Initialize callmem[0] rather than callmem and also
initialize callmem[1].

* gcc.dg/pr117239.c: New test.

arm: Use POP {pc} to return when returning [PR118089]

When generating thumb2 code,
LDM SP!, {PC}
is a two-byte instruction, whereas
LDR PC, [SP], #4
is needs 4 bytes. When optimizing for size, or when there's no obvious
performance benefit prefer the former.

gcc/ChangeLog:

PR target/118089
* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
when optimizing for size, or when there's no performance benefit over
LDR PC, [SP], #4.
(arm_expand_epilogue): Likewise.

arm: remove constraints from *pop_multiple_with_writeback_and_return

This pattern is intended to be used only by the epilogue generation
code and will always use fixed hard registers.  As such, it does not
need any register constraints, which might be misleading if a
post-reload pass wanted to try renumbering various registers.  So
remove the constraints.

Futhermore, to permit this pattern to match when popping just the PC
(which is not a valid register_operand), remove the match on the first
transfer register: pop_multiple_return will validate everything it
needs to.

gcc/ChangeLog:

* config/arm/arm.md (*pop_multiple_with_writeback_and_return): Remove
constraints.  Don't validate the first transfer register here.

arm: cleanup code in ldm_stm_operation_p; relax limits on ldm/stm

I needed to make some adjustments to this function to permit a push or
pop of a single register in thumb2 code, since ldm/stm can be a
two-byte instruction instead of 4.  Trying to read the code as it was
made me scratch my head as the logic was not very clear.  So this
patch cleans up the code somewhat, fixes a couple of minor bugs and
removes the limit of having to use multiple registers when using this
form of the instruction (the shape of this pattern is such that I
can't see it being generated automatically by the compiler, so there
should be no adverse affects of this).

Buglets fixed:
  - Validate that the first element contains RETURN if we're matching
    a return instruction.
  - Don't allow the base address register to be stored if saving regs
    and the address is being updated (this is unpredictable in the
    architecture).
  - Verify that the last register loaded in a RETURN insn is the PC.

gcc/
* config/arm/arm.cc (decompose_addr_for_ldm_stm): New function.
(ldm_stm_operation_p): Rework to clarify logic.  Allow single
registers to be pushed or popped using LDM/STM.

vect: Fix wrong code with pr108692.c on targets with only non-widening ABD [PR118727]

With things like

  // signed char a_14, a_16;
  a.0_4 = (unsigned char) a_14;
  _5 = (int) a.0_4;
  b.1_6 = (unsigned char) b_16;
  _7 = (int) b.1_6;
  c_17 = _5 - _7;
  _8 = ABS_EXPR <c_17>;
  r_18 = _8 + r_23;

An ABD pattern will be recognized for _8:

  patt_31 = .ABD (a.0_4, b.1_6);

It's still correct.  But then when the SAD pattern is recognized:

  patt_29 = SAD_EXPR <a_14, b_16, r_23>;

This is not correct.  This only happens for targets with both uabd and
sabd but not vec_widen_{s,u}abd, currently LoongArch is the only target
affected.

The problem is vect_look_through_possible_promotion will throw away a
series of conversions if the effect is equivalent to a sign change and a
promotion, but here the sign change is definitely relevant, and the
promotion is also relevant for "mixed sign" cases like
r += abs((unsigned int)(unsigned char) a - (signed int)(signed char) b
(we need to promote to HImode as the difference can exceed the range of
QImode).

If there were any redundant promotion, it should have been stripped in
vect_recog_abd_pattern (i.e. when patt_31 = .ABD (a.0_4, b.1_6) is
recognized) instead of in vect_recog_sad_pattern, or we'd have a
missed-optimization if the ABD output is not summerized.  So anyway
vect_recog_sad_pattern is just not a proper location to call
vect_look_through_possible_promotion for the ABD inputs, remove the
calls to fix the issue.

gcc/ChangeLog:

PR tree-optimization/118727
* tree-vect-patterns.cc (vect_recog_sad_pattern): Don't call
vect_look_through_possible_promotion on ABD inputs.

gcc/testsuite/ChangeLog:

PR tree-optimization/118727
* gcc.dg/pr108692.c: Mention PR 118727 in the comment.
* gcc.dg/pr118727.c: New test case.

testsuite: Revert to the original version of pr100056.c

r15-268-g9dbff9c05520 restored the original GCC 11 output for
pr100056.c, so this patch reverts the changes made to the test
in r12-7259-g25332d2325c7. (The code parts of r12-7259 still
seem useful, as a belt-and-braces thing.)

gcc/testsuite/
* gcc.target/aarch64/pr100056.c: Restore the original version of
the scan-assemblers.

libstdc++: Fix gnu.ver CXXABI_1.3.16 for Solaris [PR118701]

This patch

commit c6977f765838a5ca8d321d916221a7368622bdd9
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue Jan 21 23:50:15 2025 +0100

    libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V

broke the libstdc++-abi/abi_check test on Solaris: the log shows

1 incompatible symbols
0
Argument "{CXXABI_1.3.15}" isn't numeric in numeric eq (==) at /vol/gcc/src/hg/master/local/libstdc++-v3/scripts/extract_symvers.pl line 129.
version status: incompatible
type: uncategorized
status: added

The problem has two parts:

* The patch above introduced a new version in libstdc++.so,
  CXXABI_1.3.16, which everywhere but on RISC-V contains no symbols (a
  weak version).  This is the first time this happened in libstdc++.

* Solaris uses scripts/extract_symvers.pl to determine the version info.
  The script currently chokes on the pvs output for weak versions:

  libstdc++.so.6.0.34 - CXXABI_1.3.16 [WEAK]: {CXXABI_1.3.15};

  instead of

  libstdc++.so.6.0.34 - CXXABI_1.3.16: {CXXABI_1.3.15};

While this patch hardens the script to cope with weak versions, there's
no reason to introduce them in the first place.  So the new version is
only created on __riscv.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

2025-01-29  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
    Jonathan Wakely  <jwakely@redhat.com>

libstdc++-v3:
PR libstdc++/118701
* config/abi/pre/gnu.ver (CXXABI_1.3.16): Move __riscv guard
around version.
* scripts/extract_symvers.pl: Allow for weak versions.
* testsuite/util/testsuite_abi.cc (check_version): Wrap
CXXABI_1.3.16 in __riscv.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself.

fortran/trans-openmp.cc: Use the correct member in gfc_omp_namelist [PR118745]

gcc/fortran/ChangeLog:

PR fortran/118745
* trans-openmp.cc (gfc_trans_omp_declare_variant): Use
append_args_list in the condition for the append_arg location.

Fortran: Fix PR 47485.

The -MT and -MQ options should replace the default target in the
generated dependency file. deps_add_target needs to be called before
cpp_read_main_file, otherwise the original object name is added.

Contributed by Vincent Vanlaer <vincenttc@volkihar.be>

PR fortran/47485

gcc/fortran/ChangeLog:

* cpp.cc: fix -MT/-MQ adding additional target instead of
replacing the default.

gcc/testsuite/ChangeLog:

* gfortran.dg/dependency_generation_1.f90: New test.

Signed-off-by: Vincent Vanlaer <vincenttc@volkihar.be>

RTEMS: Add Cortex-M33 multilib

Enable use of Armv8-M instruction set.

Account for CVE-2021-35465 mitigation [PR102035].
The -mfix-cmse-cve-2021-35465 option is enabled by default,
if -mcpu=cortex-m33 is used.

gcc/

* config/arm/t-rtems: Add Cortex-M33 multilib.

Daily bump.

PR modula2/115112 Incorrect line debugging information occurs during INC builtin

This patch fixes location bugs in BuildDecProcedure,
BuildIncProcedure, BuildInclProcedure, BuildExclProcedure and
BuildThrow. All these procedure functions use the token position
passed as a parameter (rather than from the quad stack). It also
fixes location bugs in CheckRangeIncDec to ensure that the token
position is stored on the quad stack before calling subsidiary
procedure functions.

gcc/m2/ChangeLog:

PR modula2/115112
* gm2-compiler/M2Quads.mod (BuildPseudoProcedureCall): Pass
tokno to each build procedure.
(BuildThrowProcedure): New parameter functok.
(BuildIncProcedure): New parameter proctok.
Pass proctok on the quad stack during every push.
(BuildDecProcedure): Ditto.
(BuildInclProcedure): New parameter proctok.
(BuildExclProcedure): New parameter proctok.

gcc/testsuite/ChangeLog:

PR modula2/115112
* gm2/pim/run/pass/dectest.mod: New test.
* gm2/pim/run/pass/inctest.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: add fixed test [PR94100]

The recent r15-7339-g26d3424ca5d9f4 fixed this test too.

PR c++/94100

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic188.C: New test.

c++: Fix ICE with #embed/RAW_DATA_CST after list conversion [PR118671]

The following testcases ICE with RAW_DATA_CSTs (so the first one since
introduction of #embed C++ optimizations and the latter since optimization
of large sequences of comma separated literals).
I've missed the fact that implicit_conversion can embed the exact expression
passed to it into stuff pointed out by conversion * (e.g. for user
conversions in sub->cand->args).
So, it isn't enough in convert_like_internal to pass the right INTEGER_CST
for each element of the RAW_DATA_CST because the whole RAW_DATA_CST might be
in sub->cand->args etc.
Either I'd need to chase for wherever the RAW_DATA_CST is found and update
those for each element processed, or, as implemented in the following patch,
build_list_conv detects the easy optimizable case where
convert_like_internal can be kept as the whole RAW_DATA_CST with changed
type and possibly narrowing diagnostics, and otherwise instead of having
a single subconversion it has RAW_DATA_CST separate subconversions.
Instead of trying to reallocate the subconvs array when we detect that case,
the patch instead uses an artificial ck_list inside of the u.list array
to hold the individual subconversions.
Seems the only places where u.list is used are build_list_conv and
convert_like_internal.

2025-02-04 Jakub Jelinek <jakub@redhat.com>

PR c++/118671
* call.cc (build_list_conv): For RAW_DATA_CST, call
implicit_conversion with INTEGER_CST representing first byte instead
of the whole RAW_DATA_CST. If it is an optimizable trivial
conversion, just save that to subconvs, otherwise allocate an
artificial ck_list for all the RAW_DATA_CST bytes and create
subsubconv for each of them.
(convert_like_internal): For ck_list with RAW_DATA_CST, instead of
doing all the checks for optimizable conversion just check kind and
assert everything else, otherwise use subsubconversions instead of
the subconversion for each element.

* g++.dg/cpp/embed-25.C: New test.
* g++.dg/cpp0x/pr118671.C: New test.

Ada: Fix assertion failure with iterator in container aggregate

It's just a missing test for the presence of a nonempty parameter.

gcc/ada/
PR ada/118731
* sem_aggr.adb (Resolve_Iterated_Association): Add missing guard.

testsuite: RISC-V: Ignore pr118170.c for E ABI

The -mcpu=tt-ascalon-d8 option for the test implies D extension, which
is not compatible with the ILP32E and ILP64E ABIs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr118170.c: Ignore for E ABI.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

Fix file cache tunables documentation

Document new params in invoke.texi.

The auto tuning description was on the wrong tunable, move to lines.

Comitted as obvious.

gcc/ChangeLog:

* doc/invoke.texi: Document file cache tunables.
* params.opt: Move auto tuning description to lines.

arm: testsuite: Adapt mve-vabs.c to improved codegen

Since commit r15-491-gc290e6a0b7a9de this failure happens on
armv8l-linux-gnueabihf and arm-eabi:

Running gcc:gcc.target/arm/simd/simd.exp ...
gcc.target/arm/simd/mve-vabs.c: memmove found 0 times
FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3

In PR PR target/116010, Andrew Pinski noted that
"gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict
instead of memmove. That should be a simple fix there."

Therefore change the test to expect memcpy rather than memmove.

Another change is that memcpy is inlined rather than called, so also change
the test to check the optimized tree dump rather than the generated
assembly.

Tested on armv8l-linux-gnueabihf and arm-eabi.

gcc/testsuite/ChangeLog:
PR target/116010
* gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new
code.

Suggested-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: auto in trailing-return-type in parameter [PR117778]

This PR describes a few issues, both ICE and rejects-valid, but
ultimately the problem is that we don't properly synthesize the
second auto in:

  int
  g (auto fp() -> auto)
  {
    return fp ();
  }

since r12-5860, which disabled auto_is_implicit_function_template_parm_p
in cp_parser_parameter_declaration after parsing the decl-specifier-seq.

If there is no trailing auto, there is no problem.

So we have to make sure auto_is_implicit_function_template_parm_p is
properly set when parsing the trailing auto.  A complication is that
one can write:

  auto f (auto fp(auto fp2() -> auto) -> auto) -> auto;
                                      ~~~~~~~

where only the underlined auto should be synthesized.  So when we
parse a parameter-declaration-clause inside another
parameter-declaration-clause, we should not enable the flag.  We
have no flags to keep track of such nesting, but I think I can walk
current_binding_level to see if we find ourselves in such an unlikely
scenario.

PR c++/117778

gcc/cp/ChangeLog:

* parser.cc (cp_parser_late_return_type_opt): Maybe override
auto_is_implicit_function_template_parm_p.
(cp_parser_parameter_declaration): Move a make_temp_override below.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-117778.C: New test.
* g++.dg/cpp2a/abbrev-fn2.C: New test.
* g++.dg/cpp2a/abbrev-fn3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: bogus -Wvexing-parse with trailing-return-type [PR118718]

This warning should not warn for

auto f1 () -> auto;

because that cannot be confused with initializing a variable.

PR c++/118718

gcc/cp/ChangeLog:

* parser.cc (warn_about_ambiguous_parse): Don't warn when a trailing
return type is present.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wvexing-parse10.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

testsuite: XFAIL test in pr109393.c for ilp32 targets [PR116845]

The match.pd canonicalization that this testcase checks for,
is not applied on ilp32 targets.

This XFAILs the test on ilp32 targets.

PR testsuite/116845

gcc/testsuite/ChangeLog:

* gcc.dg/pr109393.c: XFAIL on ilp32 targets.

c/118742 - gimple FE parsing of unary operators of C promoted args

The GIMPLE FE currently invokes parser_build_unary_op to build
unary GENERIC which has the operand subject to C promotion rules
which does not match GIMPLE. The following adds a wrapper around
the build_unary_op worker which conveniently has an argument to
indicate whether to skip such promotion.

PR c/118742
gcc/c/
* gimple-parser.cc (gimple_parser_build_unary_op): New
wrapper around build_unary_op.
(c_parser_gimple_unary_expression): Use it.

gcc/testsuite/
* gcc.dg/gimplefe-56.c: New testcase.

IBM zSystems: Do not use @PLT with larl

Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in
64-bit mode") made GCC call both static and non-static functions and
load both static and non-static function addresses with the @PLT
suffix.  This made it difficult for linkers to distinguish calling and
address taking instructions [1].  It is currently assumed that the
R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used
only for calling, and the R_390_PC32DBL relocation, corresponding to
the empty suffix, is used only for address taking.

Linkers needs to make this distinction in order to decide whether to
ask ld.so to use canonical PLT entries.  Normally GOT entries in shared
objects contain addresses of the respective functions, with one notable
exception: when a no-pie executable calls the respective function and
also takes its address.  Such executables assume that all addresses are
known in advance, so they use addresses of the respective PLT entries.
For consistency reasons, all respective GOT entries in the process must
also use them.

When a linker sees that a no-pie executable both calls a function and
also takes its address, it creates a PLT entry and asks ld.so to
consider it canonical by setting the respective undefined symbol's
address, which is normally 0, to the address of this PLT entry.

Improve the situation by not using @PLT with larl.

Now that @PLT is not used with larl, also drop the 31-bit handling,
which was required because 31-bit PLT entries require %r12 to point to
the respective object's GOT, and this requirement is not satisfied when
calling them by pointer from another object.

Also drop the weak symbol handling, which was required because it is
not possible to load an undefined weak symbol address (0) using larl.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29655

gcc/ChangeLog:

* config/s390/s390.cc (print_operand): Remove the no longer
necessary 31-bit and weak symbol handling.
* config/s390/s390.md (*movdi_64): Do not use @PLT with larl.
(*movsi_larl): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/call-z10-pic-nodatarel.c: Adjust
expectations.
* gcc.target/s390/call-z10-pic.c: Likewise.
* gcc.target/s390/call-z10.c: Likewise.
* gcc.target/s390/call-z9-pic-nodatarel.c: Likewise.
* gcc.target/s390/call-z9-pic.c: Likewise.
* gcc.target/s390/call-z9.c: Likewise.

c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

We currently emit an incorrect -Woverloaded-virtual warning upon the
following test case

=== cut here ===
struct A {
  virtual operator int() { return 42; }
  virtual operator char() = 0;
};
struct B : public A {
  operator char() { return 'A'; }
};
=== cut here ===

The problem is that when iterating over ovl_range (fns), warn_hidden
gets confused by the conversion operator marker, concludes that
seen_non_override is true and therefore emits a warning for all
conversion operators in A that do not convert to char, even if
-Woverloaded-virtual is 1 (e.g. with -Wall, the case reported).

A second set of problems is highlighted when -Woverloaded-virtual is 2.

First, with the same test case, since base_fndecls contains all
conversion operators in A (except the one to char, that's been removed
when iterating over ovl_range (fns)), we emit a spurious warning for
the conversion operator to int, even though it's unrelated.

Second, in case there are several conversion operators with different
cv-qualifiers to the same type in A, we rightfully emit a warning,
however the note uses the location of the conversion operator marker
instead of the right one; location_of should go over conv_op_marker.

This patch fixes all these by explicitly keeping track of (1) base
methods that are overriden, as well as (2) base methods that are hidden
but not overriden (and by what), and warning about methods that are in
(2) but not (1). It also ignores non virtual base methods, per
"definition" of -Woverloaded-virtual.

Co-authored-by: Jason Merrill <jason@redhat.com>
PR c++/117114
PR c++/109918

gcc/cp/ChangeLog:

* class.cc (warn_hidden): Keep track of overloaded and of hidden
base methods.
* error.cc (location_of): Skip over conv_op_marker.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Woverloaded-virt1.C: Check that no warning is
emitted for non virtual base methods.
* g++.dg/warn/Woverloaded-virt10.C: New test.
* g++.dg/warn/Woverloaded-virt11.C: New test.
* g++.dg/warn/Woverloaded-virt12.C: New test.
* g++.dg/warn/Woverloaded-virt13.C: New test.
* g++.dg/warn/Woverloaded-virt5.C: New test.
* g++.dg/warn/Woverloaded-virt6.C: New test.
* g++.dg/warn/Woverloaded-virt7.C: New test.
* g++.dg/warn/Woverloaded-virt8.C: New test.
* g++.dg/warn/Woverloaded-virt9.C: New test.

tree-optimization/117113 - ICE with unroll-and-jam

When there's an inner loop without virtual header PHI but the outer
loop has one the fusion process cannot handle the need to create
an inner loop virtual header PHI. Punt in this case.

PR tree-optimization/117113
* gimple-loop-jam.cc (unroll_jam_possible_p): Detect when
we cannot handle virtual SSA update.

* gcc.dg/torture/pr117113.c: New testcase.

c++: Properly detect calls to digest_init in build_vec_init [PR114619]

We currently ICE in checking mode with cxx_dialect < 17 on the following
valid code

=== cut here ===
struct X {
X(const X&) {}
};
extern X x;
void foo () {
new X[1]{x};
}
=== cut here ===

We trip on a gcc_checking_assert in cp_gimplify_expr due to a
TARGET_EXPR that is not TARGET_EXPR_ELIDING_P. As pointed by Jason, the
problem is that build_vec_init does not recognize that digest_init has
been called, and we end up calling the copy constructor twice.

This happens because the detection in build_vec_init assumes that BASE
is a reference to the array, while it's a pointer to its first element
here. This patch makes sure that the detection works in both cases.

PR c++/114619

gcc/cp/ChangeLog:

* init.cc (build_vec_init): Properly determine whether
digest_init has been called.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide4.C: New test.

c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]

As mentioned in the PR, this pedwarni is desirable for the implicit or
explicit capturing of structured bindings in C++17, but in the case of
init-captures the initializer is just some expression and that can include
structured bindings.

So, the following patch limits the warning to non-explicit_init_p.

2025-02-04 Jakub Jelinek <jakub@redhat.com>

PR c++/118719
* lambda.cc (add_capture): Only pedwarn about capturing structured
binding if !explicit_init_p.

* g++.dg/cpp1z/decomp63.C: New test.

optabs: Fix widening optabs for vec-mode -> scalar-mode [PR116926]

r15-4317-ga6f4404689f12 tried to add support for widending optabs
for vec-mode -> scalar-mode but it misunderstood how FOR_EACH_MODE worked,
the limit in this case is not inclusive. Which means setting limit to from,
would cause the loop not be executed at all. This fixes by setting the
limit to be the next mode after from mode.

Note the original version that added the widening optabs for vec-mode -> scalar-mode
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665021.html) didn't have this
bug, only the second version with suggested change
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665068.html) dud. The suggested
change missed this issue with FOR_EACH_MODE.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/116926

gcc/ChangeLog:

* optabs-query.cc (find_widening_optab_handler_and_mode): Fix
limit for `vec-mode -> scalar-mode` case.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Add modular exponentiation for UNSIGNED.

gcc/fortran/ChangeLog:

* arith.cc (arith_power): Handle modular arithmetic for
BT_UNSIGNED.
(eval_intrinsic):  Error for unsigned exponentiation with
-pedantic.
* expr.cc (gfc_type_convert_binary): Use type of first
argument for unsigned exponentiation.
* gfortran.texi: Mention arithmetic exponentiation.
* resolve.cc (resolve_operator): Allow unsigned exponentiation.
* trans-decl.cc (gfc_build_intrinsic_function_decls): Build
declarations for unsigned exponentiation.
* trans-expr.cc (gfc_conv_cst_uint_power): New function.
(gfc_conv_power_op): Call it.  Handle unsigned exponentiation.
* trans.h (gfor_fndecl_unsigned_pow_list):  Add declaration.

libgfortran/ChangeLog:

* Makefile.am: Add files for unsigned exponentiation.
* Makefile.in: Regenerate.
* gfortran.map: Add functions for unsigned exponentiation.
* generated/pow_m16_m1.c: New file.
* generated/pow_m16_m16.c: New file.
* generated/pow_m16_m2.c: New file.
* generated/pow_m16_m4.c: New file.
* generated/pow_m16_m8.c: New file.
* generated/pow_m1_m1.c: New file.
* generated/pow_m1_m16.c: New file.
* generated/pow_m1_m2.c: New file.
* generated/pow_m1_m4.c: New file.
* generated/pow_m1_m8.c: New file.
* generated/pow_m2_m1.c: New file.
* generated/pow_m2_m16.c: New file.
* generated/pow_m2_m2.c: New file.
* generated/pow_m2_m4.c: New file.
* generated/pow_m2_m8.c: New file.
* generated/pow_m4_m1.c: New file.
* generated/pow_m4_m16.c: New file.
* generated/pow_m4_m2.c: New file.
* generated/pow_m4_m4.c: New file.
* generated/pow_m4_m8.c: New file.
* generated/pow_m8_m1.c: New file.
* generated/pow_m8_m16.c: New file.
* generated/pow_m8_m2.c: New file.
* generated/pow_m8_m4.c: New file.
* generated/pow_m8_m8.c: New file.
* m4/powu.m4: New file.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_15.f90: Adjust error messages.
* gfortran.dg/unsigned_43.f90: New test.
* gfortran.dg/unsigned_44.f90: New test.

rtl-optimization/117611 - ICE in simplify_shift_const_1

The following checks we have a scalar int shift mode before
enforcing it. As AVR shows the mode can be a signed _Accum mode
as well.

PR rtl-optimization/117611
* combine.cc (simplify_shift_const_1): Bail if not
scalar int mode.

* gcc.dg/fixed-point/pr117611.c: New testcase.

lto/113207 - fix free_lang_data_in_type

When we process function types we strip volatile and const qualifiers
after building a simplified type variant (which preserves those).
The qualified type handling of both isn't really compatible, so avoid
bad interaction by swapping this, first dropping const/volatile
qualifiers and then building the simplified type thereof.

PR lto/113207
* ipa-free-lang-data.cc (free_lang_data_in_type): First drop
const/volatile qualifiers from function argument types,
then build a simplified type.

* gcc.dg/pr113207.c: New testcase.

c++: Improve contracts support in modules [PR108205]

Modules makes some assumptions about types that currently aren't
fulfilled by the types created in contracts logic. This patch ensures
that exporting inline functions using contracts works again with
modules.

PR c++/108205

gcc/cp/ChangeLog:

* contracts.cc (get_pseudo_contract_violation_type): Give names
to generated FIELD_DECLs.
(declare_handle_contract_violation): Mark contract_violation
type as external linkage.
(build_contract_handler_call): Ensure any builtin declarations
created here aren't treated as attached to the current module.

gcc/testsuite/ChangeLog:

* g++.dg/modules/contracts-5_a.C: New test.
* g++.dg/modules/contracts-5_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Modularise start_cleanup_fn [PR98893]

'start_cleanup_fn' is not currently viable in modules, due to generating
functions relying on the 'start_cleanup_cnt' counter which is reset to 0
with each new TU. This means that cleanup functions declared in a TU
will conflict with any imported cleanup functions.

This patch mitigates the problem by using the mangled name of the decl
we're destroying as part of the name of the function. This should avoid
clashes unless the decls would have clashed anyway.

PR c++/98893

gcc/cp/ChangeLog:

* decl.cc (start_cleanup_fn): Make name from the mangled name of
the passed-in decl.
(register_dtor_fn): Pass decl to start_cleanup_fn.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr98893_a.H: New test.
* g++.dg/modules/pr98893_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Daily bump.

c++: find A pack from B in <typename...A,Class<A>...B> [PR118265]

For non-type parameter packs when unifying the arguments in
unify_pack_expansion it iterates over the associated packs of a param so
that when it recursively unifies the param with the arguments it knows
which targs have been populated with parameter pack arguments that it can
then collect up. This change adds a tree walk so that in the example above
it reaches ...A and adds it to the associated packs for ...B and therefore
knows it will have been set in targs in unify_pack_expansion and processes
it as per other pack arguments.

PR c++/118265

gcc/cp/ChangeLog:

* pt.cc (find_parameter_packs_r) <case TEMPLATE_PARM_INDEX>:
Walk into the type of a parameter pack.

Signed-off-by: Adam J Ryan <gcc.gnu.org@ajryansolutions.co.uk>

c++/coroutines: Fix awaiter var creation [PR116506]

Awaiters always need to have a coroutine state frame copy since
they persist across potential supensions. It simplifies the later
analysis considerably to assign these early which we do when
building co_await expressions.

The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
processing used to cater for cases where the var created from an
xvalue, or is a pointer/reference type.

Corrected thus.

PR c++/116506
PR c++/116880

gcc/cp/ChangeLog:

* coroutines.cc (build_co_await): Ensure that xvalues are
materialised. Handle references/pointer values in awaiter
access expressions.
(is_stable_lvalue): New.
* decl.cc (cxx_maybe_build_cleanup): Handle null arg.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116506.C: New test.
* g++.dg/coroutines/pr116880.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jason Merrill <jason@redhat.com>

c++: coroutines and range for [PR118491]

The implementation of extended range-for temporaries in r15-3840 confused
coroutines, because await_statement_walker and the like get confused by the
EXPR_STMT into thinking that the whole for-loop is a single expression
statement and try to process it accordingly. Fixing this seems to be a
simple matter of dropping the EXPR_STMT.

PR c++/116914
PR c++/117231
PR c++/118470
PR c++/118491

gcc/cp/ChangeLog:

* semantics.cc (finish_for_stmt): Don't wrap the result of
pop_stmt_list in EXPR_STMT.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-range-for1.C: New test.

Fortran: different character lengths in array constructor [PR93289]

PR fortran/93289

gcc/fortran/ChangeLog:

* decl.cc (gfc_set_constant_character_len): Downgrade different
string lengths in character array constructor to legacy extension.

gcc/testsuite/ChangeLog:

* gfortran.dg/unlimited_polymorphic_1.f03: Pad element in character
array constructor to correct length.
* gfortran.dg/char_array_constructor_5.f90: New test.

i386: Fix and improve TARGET_INDIRECT_BRANCH_REGISTER handling some more

gcc/ChangeLog:

* config/i386/i386.md (*sibcall_pop_memory):
Disable for TARGET_INDIRECT_BRANCH_REGISTER
* config/i386/predicates.md (call_insn_operand): Enable when
"satisfies_constraint_Bw (op)" is true, instead of open-coding
constraint here.
(sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)"

aarch64: Fix dupq_* testsuite failures

This patch fixes the dupq_* testsuite failures.  The tests were
introduced with r15-3669-ga92f54f580c3 (which was a nice improvement)
and Pengxuan originally had a follow-on patch to recognise INDEX
constants during vec_init.

I'd originally wanted to solve this a different way, using wildcards
when building a vector and letting vector_builder::finalize find the
best way of filling them in.  I no longer think that's the best
approach though.  Stepped constants are likely to be more expensive
than unstepped constants, so we should first try finding an unstepped
constant that is valid, even if it has a longer representation than
the stepped version.

This patch therefore uses a variant of Pengxuan's idea.

While there, I noticed that the (old) code for finding an unstepped
constant only tried varying one bit at a time.  So for index 0 in a
16-element constant, the code would try picking a constant from index 8,
4, 2, and then 1.  But since the goal is to create "fewer, larger,
repeating parts", it would be better to iterate over a bit-reversed
increment, so that after trying an XOR with 0 and 8, we try adding 4
to each previous attempt, then 2 to each previous attempt, and so on.
In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ...

The test shows an example of this for 8 shorts.

gcc/
* config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant):
New function, split out from...
(aarch64_expand_vector_init_fallback): ...here.  Use a bit-
reversed increment to find a constant index.  Add support for
stepped constants.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.

hppa: Revise various millicode insn patterns to use match_operand

LRA does not correctly support hard-register input operands that
are clobbered.  This is needed to support millicode calls on hppa.
The operand setup is sometimes deleted.

This problem can be avoided by hiding hard-register input operands
using match_operand.  This also potentially allows for constraints
that specify the operand is both read and written.

2025-02-03  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR rtl-optimization/117248
* config/pa/predicates.md (r25_operand): New predicate.
(r26_operand): Likewise.
* config/pa/pa.md: Use match_operand for r25 and r26 hard
register operands in mult, div, udiv, mod and umod millicode
patterns.

c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

The first argument is supposed to be a type, not a decl.

PR c++/79786
gcc/cp/
* rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.