git.ipfire.org Git - thirdparty/gcc.git/log

Fix comment typos in tree-assume.cc

gcc/ChangeLog:

* tree-assume.cc: Fix comment typos.

libgomp.texi: Update 'arch' context-selector description

* libgomp.texi (OpenMP Context Selectors): Document that 'kind' also
accepts 'cpu'/'any' on host and 'any'/'nohost' on 'nohost' devices.

testsuite: arm: Use effective-target for memset-inline* tests

Split tests into 2 parts:
- The first part checkes the assmbler generated.
- The second part does the run test and this part now requires
effective-target arm_neon_hw.

gcc/testsuite/ChangeLog:

* gcc.target/arm/memset-inline-4.c: Only check assembler output.
* gcc.target/arm/memset-inline-5.c: Likewise.
* gcc.target/arm/memset-inline-6.c: Likewise.
* gcc.target/arm/memset-inline-8.c: Likewise.
* gcc.target/arm/memset-inline-9.c: Likewise.
* gcc.target/arm/memset-inline-4-exe.c: New test.
* gcc.target/arm/memset-inline-5-exe.c: Likewise.
* gcc.target/arm/memset-inline-6-exe.c: Likewise.
* gcc.target/arm/memset-inline-8-exe.c: Likewise.
* gcc.target/arm/memset-inline-9-exe.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: C++26 uses __equal() instead of operator==()

Update test case to align with used function in C++26.

gcc/testsuite/ChangeLog:

* g++.dg/abi/arm_rtti1.C: Check for expected symbol in C++26.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: Fix toplevel-asm-1.c failure for riscv

On Wed, Dec 18, 2024 at 01:19:43PM +0100, Andreas Schwab wrote:
> On Dez 12 2024, Jakub Jelinek wrote:
>
> > The intent was to test %cN because %N doesn't DTRT on various targets.
> > I have a patch to add %ccN support which should then work even on riscv
> > hopefully, but unfortunately it hasn't been fully reviewed yet.
>
> That didn't change toplevel-asm-1, so the failure remains.

Yes, I've only committed what was approved.

The following patch ought to fix this (and if there are other targets which
don't really support %cN for SYMBOL_REFs even with -fno-pic, they can be
added there too; I think it is useful to test %cN on the targets where it
works though).

2024-12-19 Jakub Jelinek <jakub@redhat.com>

* c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 %c4
on riscv.

RISC-V: Adjust the strided store testcases check times on options

The vsse* dump check times changes on options (O2, O3) after we add
(mem:BLK (scratch)) to the define_insn of strided load.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: Adjust
the vsse check times based on optimization option.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Make vector strided store alias all other memories

Almost the same as the RVV strided load, the vector strided store
doesn't involve the (mem:BLK (scratch)) to alias all other memories.
It will make the alias analysis only consider the base address of
strided store.

PR target/118075

gcc/ChangeLog:

* config/riscv/vector.md: Add the (mem:BLK (scratch)) as the
lhs of strided store define insn.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118075-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

ifcombine field merge: handle masks with sign extensions

When a loaded field is sign extended, masked and compared, we used to
drop from the mask the bits past the original field width, which is
not correct.

Take note of the fact that the mask covered copies of the sign bit,
before clipping it, and arrange to test the sign bit if we're
comparing with zero.  Punt in other cases.

If bits_test fail recoverably, try other ifcombine strategies.

for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Add psignbit
parameter.  Set it if the mask references sign-extending
bits.
(fold_truth_andor_for_ifcombine): Adjust calls with new
variables.  Swap them along with other r?_* variables.  Handle
extended sign bit compares with zero.
* tree-ssa-ifcombine.cc (ifcombine_ifandif): If bits_test
fails in a way that doesn't prevent other ifcombine strategies
from passing, give them a try.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-16.c: New.

ifcombine field merge: handle bitfield zero tests in range tests

Some bitfield compares with zero are optimized to range tests, so
instead of X & ~(Bit - 1) != 0 what reaches ifcombine is X > (Bit - 1),
where Bit is a power of two and X is unsigned.

This patch recognizes this optimized form of masked compares, and
attempts to merge them like masked compares, which enables some more
field merging that a folder version of fold_truth_andor used to handle
without additional effort.

I haven't seen X & ~(Bit - 1) == 0 become X <= (Bit - 1), or X < Bit
for that matter, but it was easy enough to handle the former
symmetrically to the above.

The latter was also easy enough, and so was its symmetric, X >= Bit,
that is handled like X & ~(Bit - 1) != 0.

for gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Accept incoming
mask.
(fold_truth_andor_for_ifcombine): Handle some compares with
powers of two, minus 1 or 0, like masked compares with zero.

for gcc/testsuite/ChangeLog

* gcc.dg/field-merge-15.c: New.

noncontiguous ifcombine: skip marking of non-SSA_NAMEs [PR117915]

When ifcombine_mark_ssa_name is called directly, rather than by
ifcombine_mark_ssa_name_walk, we need to check that name is an
SSA_NAME at the caller or in the function itself.  For convenience and
safety, I'm moving the checks from _walk to the implementation proper.

for  gcc/ChangeLog

PR tree-optimization/117915
* tree-ssa-ifcombine.cc (ifcombine_mark_ssa_name): Move
preconditions from...
(ifcombine_mark_ssa_name_walk): ... here.

for  gcc/testsuite/ChangeLog

PR tree-optimization/117915
* gcc.dg/pr117915.c: New.

ifcombine field merge: adjust testcases [PR118025]

There was a thinko in the testcase field-merge-9.c: I overcorrected it
for big-endian.

As a bonus, I'm including stdbool.h in field-merge-12.c, because I
used bool without the header there.

for gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-9.c (q): Drop overcorrection for
big-endian.
* gcc.dg/field-merge-12.c: Include stdbool.h.

ifcombine field merge: do not follow a second conversion [PR118046]

The testcase shows that conversions that would impact negatively the
ifcombine field merging implementation won't always have been
optimized out by the time we reach ifcombine.

There's probably room to support multiple conversions with extra
logic, but this workaround should avoid codegen errors until that
logic is figured out.

for gcc/ChangeLog

PR tree-optimization/118046
* gimple-fold.cc (decode_field_reference): Don't follow more
than one conversion.

for gcc/testsuite/ChangeLog

PR tree-optimization/118046
* gcc.dg/field-merge-14.c: New.

ifcombine field merge: stricten loads tests, swap compare to match

ACATS-4 ca11d02 exposed an error in the logic for recognizing and
identifying the inner object in decode_field_ref: a view-converting
load, inserted in a previous successful field merging operation, was
recognized by gimple_convert_def_p within decode_field_reference, and
as a result we took its operand as the expression, and failed to take
note of the load location.

Without that load, we couldn't compare vuses, and then we ended up
inserting a wider load before relevant parts of the object were
initialized.

This patch makes gimple_convert_def_p recognize loads only when
requested, and requires that either both or neither parts of a
potentially merged operand have associated loads.

As a bonus, it enables additional optimizations by swapping the
operands of the second compare when that makes left-hand operands
of both compares match.

for  gcc/ChangeLog

* gimple-fold.cc (gimple_convert_def_p): Reject load stmts
unless requested.
(decode_field_reference): Accept a converting load at the last
conversion matcher, subsuming the load identification.
(fold_truth_andor_for_ifcombine): Refuse to merge operands
when only one of them has an associated load stmt.  Swap
operands of one of the compares if that helps them match.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-13.c: New.

Daily bump.

Output the load address in backtraces for PIE executables on Darwin

This aligns Darwin with Linux and Windows.

gcc/ada/
PR target/117538
* libgnat/s-trasym.adb (Symbolic_Traceback): Prepend the load
address of the executable if it is not null.

Fix bootstrap failure on SPARC with -O3 -mvis3

This replaces the use of FAIL in the new vec_cmp[u] expanders by that of a
predicate for the operator, which is (apparently) required for the optabs
machinery to properly compute the set of supported vector comparisons.

gcc/
PR target/118096
* config/sparc/predicates.md (vec_cmp_operator): New predicate.
(vec_cmpu_operator): Likewise.
* config/sparc/sparc.md (vec_cmp<FPCMP:mode><P:mode>): Use the
vec_cmp_operator predicate instead of FAILing the expansion.
(vec_cmpu<FPCMP:mode><P:mode>): Likewise for vec_cmpu_operator.

libstdc++: Have std::addressof use __builtin_addressof

Rather than calling std::__addressof in std::addressof we can directly
call __builtin_addressof to bypass 1 function call.

libstdc++-v3/ChangeLog:

* include/bits/move.h (std::addressof): Call __builtin_addressof.

[PR117248][LRA]: Fix calculation of conflict hard regs of pseudo

The 1st patch for PR117248 resulted in PR117299 (libgo failures on arm). So this is a patch
solving the problem in another way.

gcc/ChangeLog:

PR rtl-optimization/117248
* lra-lives.cc (process_bb_lives): Update conflict hard regs even
when clobber hard reg is not marked as dead.

ipcp don't propagate where not needed - fix uninit constructor

Removed unitialized empty constructor as was objected.

gcc/ChangeLog:

* lto-cgraph.cc (lto_symtab_encoder_delete_node):
Declare var later when initialized.
* lto-streamer.h (struct lto_encoder_entry):
Remove empty constructor.

Revert "[PR117248][LRA]: Rewriting reg notes update and fix calculation of conflict hard regs of pseudo."

This reverts commit 75e7d1600f47859df40b2ac0feff5a71e0dbb040.

libstdc++: Adjust probabilities of hashmap loop conditions

We are currently generating a loop which has more comparisons than you'd
typically need as the probablities on the small size loop are such that it
assumes the likely case is that an element is not found.

This again generates a pattern that's harder for branch predictors to follow,
but also just generates more instructions for the what one could say is the
typical case: That your hashtable contains the entry you are looking for.

This patch adds a __builtin_expect in _M_find_before_node where at the moment
the loop is optimized for the case where we don't do any iterations.

A simple testcase is (compiled with -fno-split-path to simulate the loop
in libstdc++):

#include <stdbool.h>

bool foo (int **a, int n, int val, int *tkn)
{
    for (int i = 0; i < n; i++)
    {
        if (!a[i] || a[i]==tkn)
          return false;

        if (*a[i] == val)
          return true;
    }
}

which generataes:

foo:
        cmp     w1, 0
        ble     .L1
        add     x1, x0, w1, uxtw 3
        b       .L4
.L9:
        ldr     w4, [x4]
        cmp     w4, w2
        beq     .L6
        cmp     x0, x1
        beq     .L1
.L4:
        ldr     x4, [x0]
        add     x0, x0, 8
        cmp     x4, 0
        ccmp    x4, x3, 4, ne
        bne     .L9
        mov     w0, 0
.L1:
        ret
.L6:
        mov     w0, 1
        ret

i.e. BB rotation makes is generate an unconditional branch to a conditional
branch. However this method is only called when the size is above a certain
threshold, and so it's likely that we have to do that first iteration.

Adding:

#include <stdbool.h>

bool foo (int **a, int n, int val, int *tkn)
{
    for (int i = 0; i < n; i++)
    {
        if (__builtin_expect(!a[i] || a[i]==tkn, 0))
          return false;

        if (*a[i] == val)
          return true;
    }
}

to indicate that we will likely do an iteration more generates:

foo:
        cmp     w1, 0
        ble     .L1
        add     x1, x0, w1, uxtw 3
.L4:
        ldr     x4, [x0]
        add     x0, x0, 8
        cmp     x4, 0
        ccmp    x4, x3, 4, ne
        beq     .L5
        ldr     w4, [x4]
        cmp     w4, w2
        beq     .L6
        cmp     x0, x1
        bne     .L4
.L1:
        ret
.L5:
        mov     w0, 0
        ret
.L6:
        mov     w0, 1
        ret

which results in ~0-10% extra on top of the previous patch.

In table form:

+-------------+---------------+-------+--------------------+-------------------+-----------------+
| benchmark   | Type          | Size  | Inline vs baseline | final vs baseline | final vs inline |
+-------------+---------------+-------+--------------------+-------------------+-----------------+
| find many   | uint64_t      | 11253 | -15.67%            | -22.96%           | -8.65%          |
| find many   | uint64_t      | 11253 | -16.74%            | -23.37%           | -7.96%          |
| find single | uint64_t      | 345   | -5.88%             | -11.54%           | -6.02%          |
| find many   | string        | 11253 | -4.50%             | -9.56%            | -5.29%          |
| find single | uint64_t      | 345   | -4.38%             | -9.41%            | -5.26%          |
| find single | shared string | 11253 | -6.67%             | -11.00%           | -4.64%          |
| find single | shared string | 11253 | -4.63%             | -9.03%            | -4.61%          |
| find single | shared string | 345   | -10.41%            | -14.44%           | -4.50%          |
| find many   | string        | 11253 | -3.41%             | -7.51%            | -4.24%          |
| find many   | shared string | 11253 | -2.30%             | -5.72%            | -3.50%          |
| find many   | string        | 13    | 2.86%              | -0.30%            | -3.07%          |
| find single | string        | 11253 | 4.47%              | 1.34%             | -3.00%          |
| find many   | custom string | 11253 | 0.25%              | -2.75%            | -2.99%          |
| find single | uint64_t      | 345   | 2.99%              | 0.01%             | -2.90%          |
| find single | shared string | 345   | -11.53%            | -13.67%           | -2.41%          |
| find single | uint64_t      | 11253 | 0.49%              | -1.59%            | -2.07%          |
+-------------+---------------+-------+--------------------+-------------------+-----------------+

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h
(_M_find_before_node): Make it likely that the map has at least one
entry and so we do at least one iteration.

libstdc++: Clear std::priority_queue after moving from it [PR118088]

We don't know what state an arbitrary sequence container will be in
after moving from it, so a moved-from std::priority_queue needs to clear
the moved-from container to ensure it doesn't contain elements that are
in an invalid order for the queue. An alternative would be to call
std::make_heap again to re-establish the rvalue queue's invariant, but
that could potentially cause an exception to be thrown. Just clearing it
so the sequence is empty seems safer and more likely to match user
expectations.

libstdc++-v3/ChangeLog:

PR libstdc++/118088
* include/bits/stl_queue.h (priority_queue(priority_queue&&)):
Clear the source object after moving from it.
(priority_queue(priority_queue&&, const Alloc&)): Likewise.
(operator=(priority_queue&&)): Likewise.
* testsuite/23_containers/priority_queue/118088.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

lto: Remap node order for stability.

This patch adds remapping of node order for each lto partition.
Resulting order conserves relative order inside partition, but
is independent of outside symbols. So if lto partition contains
identical set of symbols, their remapped order will be stable
between compilations.

This stability is needed for Incremental LTO.

gcc/ChangeLog:

* ipa-devirt.cc (ipa_odr_summary_write):
Add unused argument.
* ipa-fnsummary.cc (ipa_fn_summary_write): Likewise.
* ipa-icf.cc (sem_item_optimizer::write_summary): Likewise.
* ipa-modref.cc (modref_write): Likewise.
* ipa-prop.cc (ipa_prop_write_jump_functions): Likewise.
(ipcp_write_transformation_summaries): Likewise.
* ipa-sra.cc (ipa_sra_write_summary): Likewise.
* lto-cgraph.cc (lto_symtab_encoder_delete): Delete remap.
(lto_output_node): Remap order.
(lto_output_varpool_node): Likewise.
(output_cgraph_opt_summary): Add unused argument.
* lto-streamer-out.cc (produce_symbol_asm): Renamed. Use remapped order.
(produce_asm): Rename. New wrapper.
(output_function): Propagate remapped order.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(create_order_remap): New.
(lto_output): Create remap. Remap order.
* lto-streamer.h (struct lto_symtab_encoder_d): Remap hash_map.
(produce_asm): Add order argument.

Node clones share order.

Symbol order corresponds to the order in source code.
For clones their order is currently arbitrarily chosen as max order++
But it would be more consistent with original purpose to choose clones
order to be shared with the original node order.
This stabilizes clone order for Incremental LTO.

Order is thus no longer unique, but this property is not used outside
of previous patch, where we can use uid.
If total order would be needed, sorting by order and then uid suffices.

gcc/ChangeLog:

* cgraph.h (symbol_table::register_symbol):
Order can be already set.
* cgraphclones.cc (cgraph_node::create_clone):
Reuse order for clones.

ipa-strub: Replace cgraph_node order with uid.

ipa_strub_set_mode_for_new_functions uses node order as unique ever
increasing identifier. This is better satisfied with uid.
Order loses uniqueness with following patches.

gcc/ChangeLog:
* ipa-strub.cc (ipa_strub_set_mode_for_new_functions): Replace
order with uid.
(pass_ipa_strub_mode::execute): Likewise.

lto: Implement ltrans cache

This patch implements Incremental LTO as ltrans cache.

Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* common.opt: New flags for configuring cache.
* lto-opts.cc (lto_write_options): Don't stream the flags.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.

Implement Lockfile.

This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lockfile.o.
* lockfile.cc: New file.
* lockfile.h: New file.

Revert "PR81358: Enable automatic linking of libatomic."

This reverts commit e2f6ed54f75bbf8dd0292af90304890f06a9be17.

arm: Escape semicolon in thumb1.md

Without escaping the semicolon, the generated assembler output will not
match the expected assmbler in the test cases.

Fixes Linaro CI reported regression on r15-6166-gb7e11b499922 in
https://linaro.atlassian.net/browse/GNU-1464.

gcc/ChangeLog:

* config/arm/thumb1.md (thumb1_cbz): Escape the semicolon.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c++: Speed up compilation of large char array initializers when not using #embed

The following patch (again, on top of the #embed patchset
attempts to optimize compilation of large {{{,un}signed ,}char,std::byte}
array initializers when not using #embed in the source.

Unlike the C patch which is done during the parsing of initializers this
is done when lexing tokens into an array, because C++ lexes all tokens
upfront and so by the time we parse the initializers we already have 16
bytes per token allocated (i.e. 32 extra compile time memory bytes per
one byte in the array).

The drawback is again that it can result in worse locations for diagnostics
(-Wnarrowing, -Wconversion) when initializing signed char arrays with values
128..255.  Not really sure what to do about this though unlike the C case,
the locations would need to be preserved through reshape_init* and perhaps
till template instantiation.
For #embed, there is just a single location_t (could be range of the
directive), for diagnostics perhaps we could extend it to say byte xyz of
the file embedded here or something like that, but the optimization done by
this patch, either we'd need to bump the minimum limit at which to try it,
or say temporarily allocate a location_t array for each byte and then clear
it when we no longer need it or something.
I've been using the same testcases as for C, with #embed of 100'000'000
bytes:
time ./cc1plus -quiet -O2 -o test4a.s2 test4a.c

real    0m0.972s
user    0m0.578s
sys     0m0.195s
with xxd -i alternative of the same data without this patch it consumed
around 13.2GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s4 test4b.c

real    3m47.968s
user    3m41.907s
sys     0m5.015s
and the same with this patch it consumed around 3.7GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s3 test4b.c

real    0m24.772s
user    0m23.118s
sys     0m1.495s

2024-12-18  Jakub Jelinek  <jakub@redhat.com>

* parser.cc (cp_lexer_new_main): Attempt to optimize large sequences
of CPP_NUMBER with int type and values 0-255 separated by CPP_COMMA
into CPP_EMBED with RAW_DATA_CST u.value.

gimple-fold: Fix up decode_field_reference xor handling [PR118081]

The function comment says:
   *XOR_P is to be FALSE if EXP might be a XOR used in a compare, in which
   case, if XOR_CMP_OP is a zero constant, it will be overridden with *PEXP,
   *XOR_P will be set to TRUE, and the left-hand operand of the XOR will be
   decoded.  If *XOR_P is TRUE, XOR_CMP_OP is supposed to be NULL, and then the
   right-hand operand of the XOR will be decoded.
and the comment right above the xor_p handling says
  /* Turn (a ^ b) [!]= 0 into a [!]= b.  */
but I don't see anything that would actually check that the other operand is
0, in the testcase below it happily optimizes (a ^ 1) == 8 into a == 1.

The following patch adds that check.

Note, there are various other parts of the function I'm worried about, but
haven't had time to construct counterexamples yet.

One worrying thing is the
  /* Drop casts, only save the outermost type.  We need not worry about
     narrowing then widening casts, or vice-versa, for those that are not
     essential for the compare have already been optimized out at this
     point.  */
comment, while obviously there are various optimizations which do optimize
nested casts and the like, I'm not really sure it is safe to rely on them
happening always before this optimization, there are various options to
disable certain optimizations and some IL could appear right before
ifcombine without being optimized yet the way this routine expects.
Plus, the 3 casts are looked through in between various optimizations which
might make those narrowing/widening or vice versa cases necessary.

Also, e.g. for the xor optimization, I think there is a difference between
int a and
  (a ^ 0x23) == 0
and
  ((int) (((unsigned char) a) ^ (unsigned char) 0x23)) == 0
etc.

Another thing I'm worrying about are mixing up the different patterns
together, there is the BIT_AND_EXPR handling, BIT_XOR_EXPR handling,
RSHIFT_EXPR handling and then load handling.
What if all 4 appear together, or 3 of them, 2 of them?
Is the xor optimization still valid if there is BIT_AND_EXPR in between?
I.e. instead of
  (a ^ 123) == 0
there is
  ((a ^ 123) & 234) == 0
?

2024-12-18  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118081
* gimple-fold.cc (decode_field_reference): Only set *xor_p to true
if *xor_cmp_op is integer_zerop.

* gcc.dg/pr118081.c: New test.

PR81358: Enable automatic linking of libatomic.

ChangeLog:
PR driver/81358
* Makefile.def: Add dependencies so libatomic is built before target
libraries are configured.
* Makefile.tpl: Export TARGET_CONFIGDIRS.
* configure.ac: Add libatomic to bootstrap_target_libs.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/ChangeLog:
PR driver/81358
* common.opt: New option -flink-libatomic.
* gcc.cc (LINK_LIBATOMIC_SPEC): New macro.
* config/gnu-user.h (GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC): Use
LINK_LIBATOMIC_SPEC.
* doc/invoke.texi: Document -flink-libatomic.
* configure.ac: Define TARGET_PROVIDES_LIBATOMIC.
* configure: Regenerate.
* config.in: Regenerate.

libatomic/ChangeLog:
PR driver/81358
* Makefile.am: Pass -fno-link-libatomic.
New rule all.
* configure.ac: Assert that CFLAGS is set and pass -fno-link-libatomic.
* Makefile.in: Regenerate.
* configure: Regenerate.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Co-authored-by: Matthew Malcolmson <mmalcolmson@nvidia.com>

OpenMP: Add declare variant's 'append_args' clause in C/C++

Add the append_args clause of 'declare variant' to C and C++,
fix/improve diagnostic for 'interop' clause and 'declare_variant'
clauses on the way.

Cleanup dispatch handling in gimplify_call_expr a bit and
partially handle 'append_args'. (Namely, those parts that
do not require libraries calls, i.e. a dispatch construct
where the 'device' and 'interop' clause has been specified.)

The sorry can be removed once an enum value like
omp_ipr_(ompx_gnu_)omp_device_num (cf. OpenMP Spec Issue 4451)
has to be added to the runtime side such that omp_get_interop_int
returns the device number of an interop object (as passed to
dispatch via the interop clause); and a call to GOMP_interop
has to be added to create interop objects. Once available, only
a very localized change in gimplify_call_expr is required to
claim for full support. - And Fortran parsing support.

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_interop_t_p): Handle error_mark_node.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_init_modifiers): New;
split of from ...
(c_parser_omp_clause_init): ... here; call it.
(c_finish_omp_declare_variant): Parse 'append_args' clause.
(c_parser_omp_clause_interop): Set tree used/read.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Handle
append_args.
* parser.cc (cp_parser_omp_clause_init_modifiers): New;
split of from ...
(cp_parser_omp_clause_init): ... here; call it.
(cp_parser_omp_all_clauses): Replace interop parsing by
a call to ...
(cp_parser_omp_clause_interop): ... this new function;
set tree used/read.
(cp_finish_omp_declare_variant): Parse 'append_args' clause.
(cp_parser_omp_declare): Update comment.
* pt.cc (tsubst_attribute, tsubst_omp_clauses): Handle template
substitution also for declare variant's append_args clause,
using for 'init' the same code as for interop's init clause.

gcc/ChangeLog:

* gimplify.cc (gimplify_call_expr): Update for OpenMP's
append_args; cleanup of OpenMP's dispatch clause handling.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-2.c: Update dg-error msg.
* c-c++-common/gomp/dispatch-12.c: Likewise.
* c-c++-common/gomp/dispatch-11.c: Likewise and extend a bit.
* c-c++-common/gomp/append-args-1.c: New test.
* c-c++-common/gomp/append-args-2.c: New test.
* c-c++-common/gomp/append-args-3.c: New test.
* g++.dg/gomp/append-args-1.C: New test.
* g++.dg/gomp/append-args-2.C: New test.
* g++.dg/gomp/append-args-3.C: New test.

c++: Use type_id_in_expr_sentinel in 6 further spots in the parser

The following patch uses type_id_in_expr_sentinel in a few spots which
did it all manually.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

* parser.cc (cp_parser_postfix_expression): Use
type_id_in_expr_sentinel instead of manually saving+setting/restoring
parser->in_type_id_in_expr_p around cp_parser_type_id calls.
(cp_parser_has_attribute_expression): Likewise.
(cp_parser_cast_expression): Likewise.
(cp_parser_sizeof_operand): Likewise.

c++: Fix up pedantic handling of alignas [PR110345]

The following patch on top of the PR110345 P2552R3 series
emits pedantic pedwarns for alignas appertaining to incorrect entities.
As the middle-end and attribute exclusions look for "aligned" attribute,
the patch transforms alignas into "internal "::aligned attribute (didn't
use [[aligned (x)]] so that people can't type it that way).

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
gcc/c-family/
* c-common.h (attr_aligned_exclusions): Declare.
(handle_aligned_attribute): Likewise.
* c-attribs.cc (handle_aligned_attribute): No longer
static.
(attr_aligned_exclusions): Use extern instead of static.
gcc/cp/
* cp-tree.h (enum cp_tree_index): Add CPTI_INTERNAL_IDENTIFIER.
(internal_identifier): Define.
(internal_attribute_table): Declare.
* parser.cc (cp_parser_exception_declaration): Error on alignas
on exception declaration.
(cp_parser_std_attribute_spec): Turn alignas into internal
ns aligned attribute rather than gnu.
* decl.cc (initialize_predefined_identifiers): Initialize
internal_identifier.
* tree.cc (handle_alignas_attribute): New function.
(internal_attributes): New variable.
(internal_attribute_table): Likewise.
* cp-objcp-common.h (cp_objcp_attribute_table): Add
internal_attribute_table entry.
gcc/testsuite/
* g++.dg/cpp0x/alignas1.C: Add dg-options "".
* g++.dg/cpp0x/alignas2.C: Likewise.
* g++.dg/cpp0x/alignas7.C: Likewise.
* g++.dg/cpp0x/alignas21.C: New test.
* g++.dg/ext/bitfield9.C: Expect a warning.
* g++.dg/cpp2a/is-layout-compatible3.C: Add dg-options -pedantic.
Expect a warning.

c++: Add {,un}likely attribute further test coverage [PR110345]

Similarly for likely/unlikely attributes.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-likely1.C: New test.
* g++.dg/cpp0x/attr-unlikely1.C: New test.

c++: Add fallthrough attribute further test coverage [PR110345]

Similarly for fallthrough attribute. Had to add a second testcase because
the diagnostics for fallthrough not used within switch at all is done during
expansion and expansion won't happen if there are other errors in the
testcase.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-fallthrough1.C: New test.
* g++.dg/cpp0x/attr-fallthrough2.C: New test.

c++: Add carries_dependency further test coverage [PR110345]

This patch adds additional test coverage for the carries_dependency
attribute (unlike other attributes, the attribute actually isn't implemented
for real, so we warn even in the cases of valid uses because we ignore those
as well).

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* g++.dg/cpp0x/attr-carries_dependency2.C: New test.

c++: Handle attributes on exception declarations [PR110345]

This is a continuation of the series for the ignorability of standard
attributes.

I've added a test for assume attribute diagnostics appertaining to various
entities (mostly invalid) and while doing that, I've discovered that
attributes on exception declarations were mostly ignored, this patch
adds the missing cp_decl_attributes call and also in the
cp_parser_type_specifier_seq case differentiates between attributes and
std_attributes to be able to differentiate between attributes which apply
to the declaration using type-specifier-seq and attributes after the type
specifiers.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* parser.cc (cp_parser_type_specifier_seq): Chain cxx11_attribute_p
attributes after any type specifier in the is_declaration case
to std_attributes rather than attributes. Set also ds_attribute
or ds_std_attribute locations if not yet set.
(cp_parser_exception_declaration): Pass &type_specifiers.attributes
instead of NULL as last argument, call cp_decl_attributes.

* g++.dg/cpp0x/attr-assume1.C: New test.

c++: Diagnose attributes on class/enum declarations [PR110345]

The following testcase shows another issue where we just ignored
attributes without telling user we did that.

If there are any declarators, the ignoring of the attribute
are diagnosed in grokdeclarator etc., but if there is none
(and we don't error such as on
int;
), the following patch emits diagnostics.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* decl.cc (check_tag_decl): Diagnose std_attributes.

* g++.dg/cpp0x/gen-attrs-86.C: New test.

c++: Handle enum attributes like class attributes [PR110345]

As the following testcase shows, cp_parser_decl_specifier_seq
was calling warn_misplaced_attr_for_class_type only for class types
and not for enum types, while check_tag_decl calls them for both
class and enum types.
Enum types are really the same case here, the attribute needs to go
before the type name to apply to all instances of the type.
Additionally, when warn_misplaced_attr_for_class_type is called, it
diagnoses something and so it is fine to drop the attributes then
on the floor, but in case it wasn't a type decision, it silently
discarded the attributes, which is invalid for the ignorability of
standard attributes paper.  This patch in that case adds them to
decl_specs->std_attributes and let it be diagnosed later (e.g.
in grokdeclarator).

2024-12-18  Jakub Jelinek  <jakub@redhat.com>

PR c++/110345
* parser.cc (cp_parser_decl_specifier_seq): Call
warn_misplaced_attr_for_class_type for all OVERLOAD_TYPE_P
types, not just CLASS_TYPE_P.  When not calling
warn_misplaced_attr_for_class_type, don't clear attrs and
add it to decl_specs->std_attributes instead.

* g++.dg/cpp0x/gen-attrs-85.C: New test.

inline-asm: Add - constraint modifier support for toplevel extended asm [PR41045]

The following patch adds - constraint modifier support (only in toplevel asms),
which one can use to allow i, s and n constraint to accept SYMBOL_REFs
even with -fpic.
So, the recommended way mark toplevel asm as defining some symbol
would be ":" constraint (usually with cc modifier in the pattern), while
to mark toplevel asm as using some symbol (again, either function or
variable), one would use "-s" constraint again with address of that function
or variable.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

PR c/41045
gcc/
* stmt.cc (parse_output_constraint, parse_input_constraint): Handle
- modifier.
* recog.h (raw_constraint_p): Declare.
* recog.cc (raw_constraint_p): New variable.
(asm_operand_ok, constrain_operands): Handle - modifier.
* common.md (i, s, n): For raw_constraint_p don't require
LEGITIMATE_PIC_OPERAND_P.
* doc/md.texi: Document - constraint modifier.
gcc/c/
* c-typeck.cc (build_asm_expr): Reject - constraint modifier inside
of a function.
gcc/cp/
* semantics.cc (finish_asm_stmt): Reject - constraint modifier inside
of a function.
gcc/testsuite/
* c-c++-common/toplevel-asm-4.c: Add missing %cc2 use in template, add
bar, x, &y operands with "-i" and "-s" constraints.
(x, y): New variables.
(bar): Declare.
* c-c++-common/toplevel-asm-7.c: New test.
* c-c++-common/toplevel-asm-8.c: New test.

inline-asm: Add support for cc operand modifier

As mentioned in the "inline asm: Add new constraint for symbol definitions"
patch description, while the c operand modifier is documented to:
Require a constant operand and print the constant expression with no punctuation.
it actually doesn't do that with -fpic at least on some targets and
has been behaving that way for at least 3 decades.
It prints the operand using output_addr_const if CONSTANT_ADDRESS_P is true,
but CONSTANT_ADDRESS_P can do all sorts of target specific checks.
And if it is false, it falls back to output_operand (operands[opnum], 'c');
which will on various targets just result in an error that it is invalid
modifier letter (weird because it is documented), on others like x86 or
alpha will handle the operand in some weird way if it is a comparison
and otherwise complain the argument isn't a comparison, on others like
arm perhaps do what the user wanted.

As I wrote, we are pretty much out of modifier letters because some targets
use a lot of them, and almost out of % punctuation chars (I think ` is left)
but right now punctuation chars aren't normally followed by operand number
anyway.

So, the following patch takes one of the generic letters (c) and adds an
extra modifier char after it, I chose cc, which behaves like c but just
always uses output_addr_const instead of falling back to the machine
dependent code.

2024-12-18 Jakub Jelinek <jakub@redhat.com>

* final.cc (output_asm_insn): Add support for cc operand modifier.
* doc/extend.texi (Generic Operand Modifiers): Document cc operand
modifier.
* doc/md.texi (@samp{:} in constraint): Mention the cc operand
modifier and add small example.

* c-c++-common/toplevel-asm-4.c: Don't use -fno-pie option.
Use cc modifier instead of c.
(v, w): Add extern keyword.
* c-c++-common/toplevel-asm-6.c: New test.

inline asm: Add new constraint for symbol definitions

The following patch on top of the PR41045 toplevel extended asm patch
allows marking inline asms (both toplevel and function-local, admittedly
it is less useful for the latter, so if you want, I can add restrictions)
as defining symbols, either functions or variables.

As most remaining constraint letters are used at least on some targets,
I'm using : as the new constraint.  It is similar to "s" in that it
wants CONSTANT_P && !CONST_SCALAR_INT_P, but
1) it specially requires an address of a function or variable declaration,
   so for functions the expected use is
void foo (void);
...
":" (foo)
or
":" (&foo)
and for variables (unless they are arrays)
extern int var;
...
":" (&var)
2) it makes no sense to say that either something is defined or it is
   used in a register or something similar, so the patch diagnoses if
   one attempts to mix it with other constraints; ":,:,:" is allowed
   just because one could be using 3 alternatives in some other operand
3) unlike "s", the constraint doesn't check LEGITIMATE_PIC_OPERAND_P for
   -fpic, even in -fpic one should be able to use it the same way
4) the cgraph portion needs to be really added later
5) and last but not least, I'm afraid %c0 print modifier isn't very
   good for printing it; it works fine without -fpic/-fpie, but 'c'
   modifier is handled as
                if (CONSTANT_ADDRESS_P (operands[opnum]))
                  output_addr_const (asm_out_file, operands[opnum]);
                else
                  output_operand (operands[opnum], 'c');
   and because at least on some arches like x86 CONSTANT_ADDRESS_P
   is redefined to do backend specific PIC mess, it will just
   output_operand and likely just be rejected (on x86 with an error
   that the argument is not a comparison)
   Guess for x86 one can use %p0 instead.
   But I'm afraid we are mostly out of generic modifiers,
   and targetm.asm_out.print_operand_punct_valid_p seems to use most
   of the punctuation characters as well.
   I think ` is unused, but wonder if we want to use up the last
   remaining letter that way, perhaps make %`<letter>0?
   Or extend the existing generic modifiers, keep %c0 behave as it
   does right now and make %cc0 be a 2 letter modifier which is
   PIC friendly and prints using output_addr_const anything that can
   be printed that way?  A follow-up patch implements the %cc0 version.

2024-12-18  Jakub Jelinek  <jakub@redhat.com>

gcc/
* genpreds.cc (mangle): Add ':' mangling.
(add_constraint): Allow : constraint.
* common.md (:): New define_constraint.
* stmt.cc (parse_output_constraint): Diagnose "=:".
(parse_input_constraint): Handle ":" and diagnose invalid
uses.
* doc/md.texi (Simple Constraints): Document ":" constraint.
gcc/c/
* c-typeck.cc (build_asm_expr): Diagnose invalid ":" constraint
uses.
gcc/cp/
* semantics.cc (finish_asm_stmt): Diagnose invalid ":" constraint
uses.
gcc/testsuite/
* c-c++-common/toplevel-asm-4.c: New test.
* c-c++-common/toplevel-asm-5.c: New test.

libstdc++: Add inline keyword to _M_locate

In GCC 12 there was a ~40% regression in the performance of hashmap->find.

This regression came about accidentally:

Before GCC 12 the find function was small enough that IPA would inline it even
though it wasn't marked inline.  In GCC-12 an optimization was added to perform
a linear search when the entries in the hashmap are small.

This increased the size of the function enough that IPA would no longer inline.
Inlining had two benefits:

1.  The return value is a reference. so it has to be returned and dereferenced
    even though the search loop may have already dereference it.
2.  The pattern is a hard pattern to track for branch predictors.  This causes
    a large number of branch misses if the value is immediately checked and
    branched on. i.e. if (a != m.end()) which is a common pattern.

The patch fixes both these issues by adding the inline keyword to _M_locate
to allow the inliner to consider inlining again.

This and the other patches have been ran through serveral benchmarks where
the size, number of elements searched for and type (reference vs value) etc
were tested.

The change shows no statistical regression, but an average find improvement of
~27% and a range between ~10-60% improvements.  A selection of the results:

+-----------+--------------------+-------+----------+
| Group     | Benchmark          | Size  | % Inline |
+-----------+--------------------+-------+----------+
| Find      | unord<uint64_t     | 11274 | 53.52%   |
| Find      | unord<uint64_t     | 11254 | 47.98%   |
| Find Mult | unord<uint64_t     | 12    | 47.62%   |
| Find Mult | unord<std::string  | 12    | 44.94%   |
| Find Mult | unord<std::string  | 10    | 44.89%   |
| Find Mult | unord<uint64_t     | 11    | 40.90%   |
| Find Mult | unord<uint64_t     | 352   | 30.57%   |
| Find      | unord<uint64_t     | 351   | 28.27%   |
| Find Mult | unord<uint64_t     | 342   | 26.80%   |
| Find      | unord<std::string  | 12    | 25.66%   |
| Find Mult | unord<std::string  | 352   | 23.12%   |
| Find      | unord<std::string  | 13    | 20.36%   |
| Find Mult | unord<std::string  | 355   | 19.23%   |
| Find      | unord<std::string  | 353   | 18.59%   |
| Find      | unord<uint64_t     | 350   | 15.43%   |
| Find      | unord<std::string  | 11260 | 11.80%   |
| Find      | unord<std::string  | 352   | 11.12%   |
| Find      | unord<std::string  | 11262 | 9.97%    |
+-----------+--------------------+-------+----------+

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Inline _M_locate.

LoongArch: Add crc tests

gcc/testsuite/ChangeLog:

* g++.target/loongarch/crc.C: New test.
* g++.target/loongarch/crc-scan.C: New test.

LoongArch: Combine xor and crc instructions

For a textbook-style CRC implementation:

    uint32_t crc = 0xffffffffu;
    for (size_t k = 0; k < len; k++)
      {
crc ^= data[k];
for (int i = 0; i < 8 * sizeof (T); i++)
  if (crc & 1)
    crc = (crc >> 1) ^ poly;
  else
    crc >>= 1;
      }
    return crc;

The generic code reports:

    Data and CRC are xor-ed before for loop.  Initializing data with 0.

resulting in:

    ld.bu     $t1, $a0, 0
    xor       $t0, $t0, $t1
    crc.w.b.w $t0, $zero, $t0

But it's just better to use

    ld.bu     $t1, $a0, 0
    crc.w.b.w $t0, $t1, $t0

instead.  Implement this optimization now.

gcc/ChangeLog:

* config/loongarch/loongarch.md (*crc_combine): New
define_insn_and_split.

LoongArch: Add CRC expander to generate faster CRC

64-bit LoongArch has native CRC instructions for two specific
polynomials. For other polynomials or 32-bit, use the generic
table-based approach but optimize bit reversing.

gcc/ChangeLog:

* config/loongarch/loongarch.md (crc_rev<mode:SUBDI>si4): New
define_expand.

LoongArch: Add bit reverse operations

LoongArch supports native bit reverse operation for QI, SI, DI, and for
HI we can expand it into a shift and a bit reverse in word_mode.

I was reluctant to add them because until PR50481 is fixed these
operations will be just useless. But now it turns out we can use them
to optimize the bit reversing CRC calculation if recognized by the
generic CRC pass. So add them in prepare for the next patch adding CRC
expanders.

gcc/ChangeLog:

* config/loongarch/loongarch.md (@rbit<mode:GPR>): New
define_insn template.
(rbitsi_extended): New define_insn.
(rbitqi): New define_insn.
(rbithi): New define_expand.

LoongArch: Remove QHSD and use QHWD instead

QHSD and QHWD are basically the same thing, but QHSD will be incorrect
when we start to add LA32 support. So it's just better to always use
QHWD.

gcc/ChangeLog:

* config/loongarch/loongarch.md (QHSD): Remove.
(loongarch_<crc>_w_<size>_w): Use QHSD instead of QHWD.
(loongarch_<crc>_w_<size>_w_extended): Likewise.

libstdc++: Add missing character to __to_wstring_numeric map

The mapping from char to wchar_t needs to handle 'i' and 'I' but those
were absent from the table that is used for some non-ASCII encodings.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__to_wstring_numeric): Add 'i'
and 'I' to mapping.

libstdc++: Call regex_traits::transform_primary() only when necessary [PR98723]

This is both a performance optimization and a partial fix for PR 98723.

This commit fixes the issue for bracket expressions that do not depend
on the locale's collation facet. Examples:

* Character ranges ([a-z]) when std::regex::collate is not set
* Character classes ([:alnum:])
* Individual characters ([abc])

Signed-off-by: Luca Bacci <luca.bacci982@gmail.com>
libstdc++-v3/ChangeLog:

PR libstdc++/98723
* include/bits/regex_compiler.tcc (_BracketMatcher::_M_apply):
Only use transform_primary when an equivalence set is used.

Documentation: Fix paste-o in recent OpenMP/OpenACC patch

gcc/ChangeLog
* doc/extend.texi (OpenACC): Fix paste-o.

c++: modules: Fix 32-bit overflow with 64-bit location_t [PR117970]

With the move to 64-bit location_t in r15-6016, I missed a spot in module.cc
where a location_t was still being stored in a 32-bit int. Fixed.

The xtreme-header* tests in modules.exp were still passing fine on lots of
architectures that were tested (x86-64, i686, aarch64, sparc, riscv64), but
the PR shows that they were failing in some particular risc-v multilib
configurations. They pass now.

gcc/cp/ChangeLog:

PR c++/117970
* module.cc (module_state::read_ordinary_maps): Change argument to
line_map_uint_t instead of unsigned int.

Daily bump.

c++: print NONTYPE_ARGUMENT_PACK [PR118073]

This PR points out that we're not pretty-printing NONTYPE_ARGUMENT_PACK
so the compiler emits the ugly:

'nontype_argument_pack' not supported by dump_expr<expression error>>

Fixed thus. I've wrapped the elements of the pack in { } because that's
what cxx_pretty_printer::expression does.

PR c++/118073

gcc/cp/ChangeLog:

* error.cc (dump_expr) <case NONTYPE_ARGUMENT_PACK>: New case.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/arg-pack1.C: New test.

libstdc++: Fix -Wparentheses warning in Debug Mode macro

libstdc++-v3/ChangeLog:

* include/debug/safe_local_iterator.h (_GLIBCXX_DEBUG_VERIFY_OPERANDS):
Add parentheses to avoid -Wparentheses warning.

libstdc++: Fix std::deque::insert(pos, first, last) undefined behaviour [PR118035]

Inserting an empty range into a std::deque results in undefined calls to
either std::copy, std::copy_backward, std::move, or std::move_backward.
We call those algos with invalid arguments where the output range is the
same as the input range, e.g. std::copy(first, last, first) which
violates the preconditions for the algorithms.

This fix simply returns early if there's nothing to insert. Most callers
already ensure that we don't even call _M_range_insert_aux with an empty
range, but some callers don't. Rather than checking for n == 0 in each
of the callers, this just does the check once and uses __builtin_expect
to treat empty insertions as unlikely.

libstdc++-v3/ChangeLog:

PR libstdc++/118035
* include/bits/deque.tcc (_M_range_insert_aux): Return
immediately if inserting an empty range.
* testsuite/23_containers/deque/modifiers/insert/118035.cc: New
test.

Documentation: Make OpenMP/OpenACC docs easier to find [PR26154]

PR c/26154 is one of our oldest documentation issues.  The only
discussion of OpenMP support in the GCC manual is buried in the "C
Dialect Options" section, with nothing at all under "Extensions".  The
Fortran manual does have separate sections for OpenMP and OpenACC
extensions so I have copy-edited/adapted that text for similar sections
in the GCC manual, as well as breaking out the OpenMP and OpenACC options
into their own section (they apply to all of C, C++, and Fortran).

I also updated the information about what versions of OpenMP and
OpenACC are supported and removed some redundant text from the Fortran
manual to prevent it from getting out of sync on future updates, and
inserted some cross-references to the new sections elsewhere.

gcc/c-family/ChangeLog
PR c/26154
* c.opt.urls: Regenerated.

gcc/ChangeLog
PR c/26154
* common.opt.urls: Regenerated.
* doc/extend.texi (C Extensions): Adjust menu for new sections.
(Attribute Syntax): Mention OpenMP directives.
(Pragmas): Mention OpenMP and OpenACC directives.
(OpenMP): New section.
(OpenACC): New section.
* doc/invoke.texi (Invoking GCC): Adjust menu for new section.
(Option Summary): Move OpenMP and OpenACC options to their own
category.
(C Dialect Options): Move documentation for -foffload, -fopenacc,
-fopenacc-dim, -fopenmp, -fopenmd-simd, and
-fopenmp-target-simd-clone to...
(OpenMP and OpenACC Options): ...this new section.  Light
copy-editing of the option descriptions.

gcc/fortran/ChangeLog:
PR c/26154
* gfortran.texi (Standards): Remove redundant info about
OpenMP/OpenACC standard support.
(OpenMP): Copy-editing and update version info.
(OpenACC): Likewise.
* lang.opt.urls: Regenerated.

middle-end/118062 - bogus lowering of vector compares

The generic expand_vector_piecewise routine supports lowering of
a vector operation to vector operations of smaller size.  When
computing the extract position from the larger vector it uses the
element size in bits of the original result vector to determine
the number of elements in the smaller vector.  That is wrong when
lowering a compare as the vector element size of a bool vector
does not have to agree with that of the compare operand.  The
following simplifies this, fixing the error.

PR middle-end/118062
* tree-vect-generic.cc (expand_vector_piecewise): Properly
compute delta.

c++: ICE initializing array of aggrs [PR117985]

This crash started with my r12-7803 but I believe the problem lies
elsewhere.

build_vec_init has cleanup_flags whose purpose is -- if I grok this
correctly -- to avoid destructing an object multiple times.  Let's
say we are initializing an array of A.  Then we might end up in
a scenario similar to initlist-eh1.C:

  try
    {
      call A::A in a loop
      // #0
      try
        {
  call a fn using the array
}
      finally
{
  // #1
  call A::~A in a loop
}
    }
  catch
    {
      // #2
      call A::~A in a loop
    }

cleanup_flags makes us emit a statement like

  D.3048 = 2;

at #0 to disable performing the cleanup at #2, since #1 will take
care of the destruction of the array.

But if we are not emitting the loop because we can use a constant
initializer (and use a single { a, b, ...}), we shouldn't generate
the statement resetting the iterator to its initial value.  Otherwise
we crash in gimplify_var_or_parm_decl because it gets the stray decl
D.3048.

PR c++/117985

gcc/cp/ChangeLog:

* init.cc (build_vec_init): Pop CLEANUP_FLAGS if we're not
generating the loop.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-array23.C: New test.
* g++.dg/cpp0x/initlist-array24.C: New test.

[PATCH] RISC-V: optimization on checking certain bits set ((x & mask) == val)

The patch optimizes code generation for comparisons of the form
X & C1 == C2 by converting them to (X | ~C1) == (C2 | ~C1).
C1 is a constant that requires li and addi to be loaded,
while ~C1 requires a single lui instruction.
As the values of C1 and C2 are not visible within
the equality expression, a plus pattern is matched instead.

PR target/114087

gcc/ChangeLog:

* config/riscv/riscv.md (*lui_constraint<ANYI:mode>_and_to_or): New pattern

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr114087-1.c: New test.

RISC-V: Remove svvptc from riscv-ext-bitmask.def

There should be no svvptc in the riscv-ext-bitmask.def file since it has
not yet been added to the RISC-V C API Specification or the Linux
hwprobe. And there is no need for userspace software to know that this
extension exists. So remove it from the riscv-ext-bitmask.def file.

Fixes: e4f4b2dc08 ("RISC-V: Minimal support for svvptc extension.")
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Remove svvptc.

testsuite: arm: Mark pr81812.C as xfail for thumb1

Test fails for Cortex-M0 with:

.../pr81812.C:6:8: error: generic thunk code fails for method 'virtual void ChildNode::_ZTv0_n12_NK9ChildNode5errorEz(...) const' which uses '...'

According to PR108277, it's expected that thumb1 targets does not
support empty virtual functions with ellipsis.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr81812.C: Add xfail for thumb1.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

[PATCH v2 2/2] RISC-V: Add Tenstorrent Ascalon 8 wide architecture

This adds the Tenstorrent Ascalon 8 wide architecture (tt-ascalon-d8)
to the list of known cores.

gcc/ChangeLog:

* config/riscv/riscv-cores.def: Add tt-ascalon-d8.
* config/riscv/riscv.cc (tt_ascalon_d8_tune_info): New.
* doc/invoke.texi (RISC-V): Add tt-ascalon-d8 to -mcpu.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-tt-ascalon-d8.c: New test.

[PATCH v2 1/2] RISC-V: Document thead-c906, xiangshan-nanhu, and generic-ooo

gcc/ChangeLog
* doc/invoke.texi (RISC-V): Add thead-c906, xiangshan-nanhu to
-mcpu, add generic-ooo and remove thead-c906 from -mtune.

testsuite: arm: Add -mtune to all arm_cpu_* effective targets

Fixes Linaro CI reported regression on r15-6164-gbdf75257aad2 in
https://linaro.atlassian.net/browse/GNU-1463.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Added corresponding -mtune= option
for each fo the arm_cpu_* effective targets.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

RISC-V: Add new constraint R for register even-odd pairs

Although this constraint is not currently used for any instructions, it is very
useful for custom instructions. Additionally, some new standard extensions
(not yet upstream), such as `Zilsd` and `Zclsd`, are potential users of this
constraint. Therefore, I believe there is sufficient justification to add it
now.

gcc/ChangeLog:

* config/riscv/constraints.md (R): New constraint.
* doc/md.texi: Document new constraint `R`.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/constraint-R.c: New.

RISC-V: Implment N modifier for printing the register number rather than the register name

The modifier `N`, to print the raw encoding of a register. This is used
when using `.insn <length>, <encoding>`, where the user wants to pass
a value to the instruction in a known register, but where the
instruction doesn't follow the existing instruction formats, so the
assembly parser is not expecting a register name, just a raw integer.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add N.
* doc/extend.texi: Document for N,

gcc/testsuite/ChangeLog:

* gcc.target/riscv/modifier-N-fpr.c: New.
* gcc.target/riscv/modifier-N-vr.c: New.
* gcc.target/riscv/modifier-N.c: New.

RISC-V: Rename internal operand modifier N to n

Here is a purposal that using N for printing register encoding number,
so let rename the existing internal operand modifier `N` to `n`.

gcc/ChangeLog:

* config/riscv/corev.md (*cv_branch<mode>): Update modifier.
(*branch<mode>): Ditto.
* config/riscv/riscv.cc (riscv_print_operand): Update modifier.
* config/riscv/riscv.md (*branch<mode>): Update modifier.

RISC-V: Add cr and cf constraint

gcc/ChangeLog:

* config/riscv/constraints.md (cr): New.
(cf): New.
* config/riscv/riscv.h (reg_class): Add RVC_GR_REGS and
RVC_FP_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* doc/md.texi: Document cr and cf constraint.
* config/riscv/riscv.cc (riscv_regno_to_class): Update
FP_REGS to RVC_FP_REGS since it smaller set.
(riscv_secondary_memory_needed): Handle RVC_FP_REGS.
(riscv_register_move_cost): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/constraint-cf-zfinx.c: New.
* gcc.target/riscv/constraint-cf.c: New.
* gcc.target/riscv/constraint-cr.c: New.

RISC-V: Rename constraint c0* to k0*

Rename those constraint since we want define other constraint start with
`c`, those constraints are internal and undocumented, so it's fine to
rename.

gcc/ChangeLog:

* config/riscv/constraints.md (c01): Rename to...
(k01): ...this.
(c02): Rename to...
(k02): ...this.
(c03): Rename to...
(k03): ...this.
(c04): Rename to...
(k04): ...this.
(c08): Rename to...
(k08): ...this.
* config/riscv/corev.md (riscv_cv_simd_add_h_si): Update
constraints.
(riscv_cv_simd_sub_h_si): Ditto.
(riscv_cv_simd_cplxmul_i_si): Ditto.
(riscv_cv_simd_subrotmj_si): Ditto.
* config/riscv/riscv-v.cc (splat_to_scalar_move_p): Update
constraints.
* config/riscv/vector-iterators.md (stride_load_constraint):
Update constraints.
(stride_store_constraint): Ditto.

ipa: Improve how we derive value ranges from IPA invariants

I believe that the current function ipa_range_set_and_normalize lacks
a check that a base of an ADDR_EXPR lacks a test whether the base
really cannot be NULL, so this patch adds it.  Moreover, I never liked
the name as I do not think it makes the value of ranges any more
normal but rather just special-cases non-zero ip_invariant pointers.
Therefore, I have given it a different name and moved it to a .cc
file, our LTO bootstrap should inline (and/or split) it if necessary
anyway.

Because, as Honza correctly pointed out, deriving non-NULLness from a
pointer depends on flag_delete_null_pointer_checks which is an
optimization flag and thus depends on a given function, in this
version of the patch ipa_get_range_from_ip_invariant gets a
context_node parameter for that purpose.  This then needs to be used
within symtab_node::nonzero_address which gets a special overload in
which the value of the flag can be provided as a parameter.

gcc/ChangeLog:

2024-12-11  Martin Jambor  <mjambor@suse.cz>

* cgraph.h (symtab_node): Add a new overload of nonzero_address.
* symtab.cc (symtab_node::nonzero_address): Add a new overload whith a
parameter for delete_null_pointer_checks.  Make the original overload
call the new one which has retains the actual implementation.
* ipa-prop.h (ipa_get_range_from_ip_invariant): Declare.
(ipa_range_set_and_normalize): Remove.
* ipa-prop.cc (ipa_get_range_from_ip_invariant): New function.
(ipa_range_set_and_normalize): Remove.
* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Add a new parameter
context_node. Use ipa_get_range_from_ip_invariant instead of
ipa_range_set_and_normalize and pass to it the new parameter.
(ipa_value_range_from_jfunc): Pass cs->caller as the context_node to
ipa_vr_intersect_with_arith_jfunc.
(propagate_vr_across_jump_function): Likewise.
(ipa_get_range_from_ip_invariant): New function.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Use
ipa_get_range_from_ip_invariant instead of ipa_range_set_and_normalize

ipa: Better value ranges for pointer integer constants

When looking into cases where we know an actual argument of a call is
a constant but we don't generate a singleton value-range for the jump
function, I found out that the special handling of pointer constants
does not work well for constant zero pointer values.  In fact the code
only attempts to see if it can figure out that an argument is not zero
and if it can figure out any alignment information.

With this patch, we try to use the value_range that ranger can give us
in the jump function if we can and we query ranger for all kinds of
arguments, not just SSA_NAMES (and so also pointer integer constants).
If we cannot figure out a useful range we fall back again on figuring
out non-NULLness with tree_single_nonzero_warnv_p.

With this patch, we generate

  [prange] struct S * [0, 0] MASK 0x0 VALUE 0x0

instead of for example:

  [prange] struct S * [0, +INF] MASK 0xfffffffffffffff0 VALUE 0x0

for a zero constant passed in a call.

If you are wondering why we check whether the value range obtained
from range_of_expr can be undefined, even when the function returns
true, that is because that can apparently happen fro default-definition
SSA_NAMEs.

gcc/ChangeLog:

2024-11-15  Martin Jambor  <mjambor@suse.cz>

* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Try harder to
use the value range obtained from ranger for pointer values.

ipa: Skip widening type conversions in jump function constructions

Originally, we did not stream any formal parameter types into WPA and
were generally very conservative when it came to type mismatches in
IPA-CP.  Over the time, mismatches that happen in code and blew up in
WPA made us to be much more resilient and also to stream the types of
the parameters which we now use commonly.

With that information, we can safely skip conversions when looking at
the IL from which we build jump functions and then simply fold convert
the constants and ranges to the resulting type, as long as we are
careful that performing the corresponding folding of constants gives
the corresponding results.  In order to do that, we must ensure that
the old value can be represented in the new one without any loss.
With this change, we can nicely propagate non-NULLness in IPA-VR as
demonstrated with the new test case.

I have gone through all other uses of (all components of) jump
functions which could be affected by this and verified they do indeed
check types and can handle mismatches.

gcc/ChangeLog:

2024-12-11  Martin Jambor  <mjambor@suse.cz>

* ipa-prop.cc: Include vr-values.h.
(skip_a_safe_conversion_op): New function.
(ipa_compute_jump_functions_for_edge): Use it.

gcc/testsuite/ChangeLog:

2024-11-01  Martin Jambor  <mjambor@suse.cz>

* gcc.dg/ipa/vrp9.c: New test.

c++: Diagnose earlier non-static data members with cv containing class type [PR116108]

In r10-6457 aka PR92593 fix a check has been added to reject
earlier non-static data members with current_class_type in templates,
as the deduction then can result in endless recursion in reshape_init.
It fixed the
template <class T>
struct S { S s = 1; };
S t{2};
crashes, but as the following testcase shows, didn't catch when there
are cv qualifiers on the non-static data member.

Fixed by using TYPE_MAIN_VARIANT.

2024-12-17 Jakub Jelinek <jakub@redhat.com>

PR c++/116108
gcc/cp/
* decl.cc (grokdeclarator): Pass TYYPE_MAIN_VARIANT (type)
rather than type to same_type_p when checking if the non-static
data member doesn't have current class type.
gcc/testsuite/
* g++.dg/cpp1z/class-deduction117.C: New test.

Fortran: Fix associate with derived type array construtor [PR117347]

gcc/fortran/ChangeLog:

PR fortran/117347

* primary.cc (gfc_match_varspec): Add array constructors for
guessing their type like with unresolved function calls.

gcc/testsuite/ChangeLog:

* gfortran.dg/associate_71.f90: New test.

Daily bump.

Update cpplib sr.po

* sr.po: Update.

i386: Fix tabs vs. spaces in mmx.md

gcc/ChangeLog:

* config/i386/mmx.md: Fix tabs vs. spaces.

i386: Add HImode to VALID_SSE2_REG_MODE

Move explicit Himode handling for SSE2 XMM regnos from
ix86_hard_regno_mode_ok to VALID_SSE2_REG_MODE.

No functional change.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_hard_regno_mode_ok):
Remove explicit HImode handling for SSE2 XMM regnos.
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add HImode.

testsuite: Force max-completely-peeled-insns=300 for CRIS, PR118055

This handles fallout from r15-6097-gee2f19b0937b5e.  A brief
analysis shows that the metric used in that code is computed
by estimate_move_cost, differentiating on the target macro
MOVE_MAX_PIECES (which defaults to MOVE_MAX) which for most
"32-bit targets" is 4 and for "64-bit targets" is 8.  There
are some outliers, like pru, with MOVE_MAX set to 8 but
counting as a 32-bit target.

So, the main difference for this test-case, which is heavy
on 64-bit moves (most targets have "double" mapped to IEEE
64-bit), is between "32-bit" and "64-bit", with the cost up
to twice for the former compared to the latter.  I see no
effective_target_move_max_is_4 or equivalent, and this
instance falls below the threshold of adding one, so I'm
sticking to a list of targets.  For CRIS, it would suffice
with 210, but there's no need to be this specific, and it
would make the test even more brittle.

PR tree-optimization/118055
* gcc.dg/tree-ssa/pr83403-1.c, gcc.dg/tree-ssa/pr83403-2.c: Add
cris-*-* to targets passing --param=max-completely-peeled-insns=300.

sarif-replay: handle embedded links (§3.11.6)

Handle embedded links in plain text messages. For now, merely
use the link text and discard the destination.

gcc/ChangeLog:
* libsarifreplay.cc (struct embedded_link): New.
(maybe_consume_embedded_link): New.
(sarif_replayer::make_plain_text_within_result_message): Handle
embedded links by using the link text, for now.

gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-valid/3.11.6-embedded-links.sarif: New test.
* sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: Update
expected output for handling the embedded links.
* sarif-replay.dg/2.1.0-valid/spec-example-4.sarif: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libgdiagnostics: consolidate logical locations

This patch updates diagnostic_manager_new_logical_location so
that repeated calls with the same input values yield the same
instance of diagnostic_logical_location.

Doing so allows the path-printing logic to properly consolidate runs of
events, whereas previously it could treat each event as having a
distinct logical location, and thus require them to be printed
separately; this greatly improves the output of sarif-replay when
displaying execution paths.

gcc/ChangeLog:
* doc/libgdiagnostics/topics/logical-locations.rst
(diagnostic_manager_new_logical_location): Add note about repeated
calls.
* libgdiagnostics.cc: Define INCLUDE_MAP.
(class owned_nullable_string): Add copy ctor and move ctor.
(owned_nullable_string::operator<): New.
(diagnostic_logical_location::operator<): New.
(diagnostic_manager::new_logical_location): Use m_logical_locs to
"uniquify" instances, converting it to a std::map.
(diagnostic_manager::logical_locs_map_t): New typedef.
(diagnostic_manager::t m_logical_locs): Convert from a std::vector
to a std::map.
(diagnostic_execution_path::same_function_p): Update comment.

gcc/testsuite/ChangeLog:
* libgdiagnostics.dg/test-logical-location.c: Include <assert.h>.
Verify that creating a diagnostic_logical_location with equal
values yields the same instance.
* sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: New test.
* sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: Update
expected output to show logical location and for consolidation of
events into runs.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/spec-example-4.sarif: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif-replay: quote source from artifact contents [PR117943]

The diagnostic source-quoting machinery uses class file_cache
implemented in gcc/input.cc for (re)reading the source when
issuing diagnostics.

When sarif-replay issues a saved diagnostic it might be running
in a different path to where the .sarif file was captured, or
on an entirely different machine.

Previously such invocations would lead to the source-quoting
silently failing, even if the content of the file is recorded
in the .sarif file in the artifact "contents" property (which
gcc populates when emitting .sarif output).

This patch:
- adds the ability for slots in file_cache to be populated from memory
rather than from the filesystem
- exposes it in libgdiagnostics via a new entrypoint
- uses this in sarif-replay for any artifacts with a "contents"
property, so that source-quoting uses that rather than trying to read
from the path on the filesystem

gcc/ChangeLog:
PR sarif-replay/117943
* doc/libgdiagnostics/topics/physical-locations.rst
(diagnostic_manager_new_file): Drop "const" from return type.
* doc/libgdiagnostics/tutorial/02-physical-locations.rst: Drop
"const" from "main_file" decl.
* input.cc (file_cache::add_buffered_content): New.
(file_cache_slot::set_content): New.
(file_cache_slot::dump): Use m_file_path being null rather than
m_fp to determine empty slots.  Dump m_fp.
(find_end_of_line): Drop "const" from return type and param.  Add
forward decl.
(file_cache_slot::get_next_line): Fix "const"-ness.
(selftest::test_reading_source_buffer): New.
(selftest::input_cc_tests): Call it.
* input.h (file_cache::add_buffered_content): New decl.
* libgdiagnostics++.h (class file): Drop const-ness from m_inner.
(file::set_buffered_content): New.
* libgdiagnostics.cc (class content_buffer): New.
(diagnostic_file::diagnostic_file): Add "mgr" param.
(diagnostic_file::get_content): New.
(diagnostic_file::set_buffered_content): New.
(diagnostic_file::m_mgr): New.
(diagnostic_file::m_content): New.
(diagnostic_manager::new_file): Drop const-ness.  Pass *this to
ctor.
(diagnostic_file::set_buffered_content): New.
(diagnostic_manager_new_file): Drop "const" from return type.
(diagnostic_file_set_buffered_content): New entrypoint.
(diagnostic_manager_debug_dump_file): Dump the content size,
if any.
* libgdiagnostics.h (diagnostic_manager_new_file): Drop "const"
from return type.
(diagnostic_file_set_buffered_content): New decl.
* libgdiagnostics.map (diagnostic_file_set_buffered_content): New
symbol.
* libsarifreplay.cc (sarif_replayer::m_artifacts_arr): Convert
from json::value to json::array.
(sarif_replayer::handle_run_obj): Call handle_artifact_obj
on all artifacts.
(sarif_replayer::handle_artifact_obj): New.

gcc/testsuite/ChangeLog:
PR sarif-replay/117943
* sarif-replay.dg/2.1.0-valid/error-with-note.sarif: Update
expected output to include quoted source code and underlines.
* sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: New test.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Update expected
output to include quoted source code and underlines.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: move libgdiagnostics dc from sinks into diagnostic_manager

libgdiagnostics was written before the fixes for PR other/116613 allowed
a diagnostic_context to have multiple output sinks.

Hence each libgdiagnostics sink had its own diagnostic_context with just
one diagnostic_output_format.

This wart is no longer necessary and makes it harder to move state
into the manager/context; in particular for quoting source code
from the .sarif file (PR sarif-replay/117943).

Simplify, by making libgdiagnostics' implementation more similar to
GCC's implementation, by moving the diagnostic_context from sink into
diagnostic_manager.

Doing so requires generalizing where the
diagnostic_source_printing_options comes from in class
diagnostic_text_output_format: for GCC we use
the instance within the diagnostic_context, whereas for
libgdiagnostics each diagnostic_text_sink has its own instance.

No functional change intended.

gcc/c-family/ChangeLog:
PR sarif-replay/117943
* c-format.cc (selftest::test_type_mismatch_range_labels): Use
dc.m_source_printing.
* c-opts.cc (c_diagnostic_text_finalizer): Use source-printing
options from text_output.

gcc/cp/ChangeLog:
PR sarif-replay/117943
* error.cc (auto_context_line::~auto_context_line): Use
source-printing options from text_output.

gcc/ChangeLog:
PR sarif-replay/117943
* diagnostic-format-text.cc
(diagnostic_text_output_format::append_note): Use source-printing
options from text_output.
(diagnostic_text_output_format::update_printer): Copy
source-printing options from dc.
(default_diagnostic_text_finalizer): Use source-printing
options from text_output.
* diagnostic-format-text.h
(diagnostic_text_output_format::diagnostic_text_output_format):
Add optional diagnostic_source_printing_options param, using
the context's if null.
(diagnostic_text_output_format::get_source_printing_options): New
accessor.
(diagnostic_text_output_format::m_source_printing): New field.
* diagnostic-path.cc (event_range::print): Use source-printing
options from text_output.
(selftest::test_interprocedural_path_1): Use source-printing
options from dc.
* diagnostic-show-locus.cc
(gcc_rich_location::add_location_if_nearby): Likewise.
(diagnostic_context::maybe_show_locus): Add "opts" param
and use in place of m_source_printing. Pass it to source_policy
ctor.
(diagnostic_source_print_policy::diagnostic_source_print_policy):
Add overload taking a const diagnostic_source_printing_options &.
* diagnostic.cc (diagnostic_context::initialize): Pass nullptr
for source options when creating text sink, so that it uses
the dc's options.
(diagnostic_context::dump): Add an "output sinks:" heading and
print "(none)" if there aren't any.
(diagnostic_context::set_output_format): Split out code into...
(diagnostic_context::remove_all_output_sinks): ...this new
function.
* diagnostic.h
(diagnostic_source_print_policy::diagnostic_source_print_policy):
Add overload taking a const diagnostic_source_printing_options &.
(diagnostic_context::maybe_show_locus): Add "opts" param.
(diagnostic_context::remove_all_output_sinks): New decl.
(diagnostic_context::m_source_printing): New field.
(diagnostic_show_locus): Add "opts" param and pass to
maybe_show_locus.
* libgdiagnostics.cc (sink::~sink): Delete.
(sink::begin_group): Delete.
(sink::end_group): Delete.
(sink::emit): Delete.
(sink::m_dc): Drop field.
(diagnostic_text_sink::on_begin_text_diagnostic): Delete.
(diagnostic_text_sink::get_source_printing_options): Use
m_souece_printing.
(diagnostic_text_sink::m_current_logical_loc): Drop field.
(diagnostic_text_sink::m_inner_sink): New field.
(diagnostic_text_sink::m_source_printing): New field.
(diagnostic_manager::diagnostic_manager): Update for changes
to fields. Initialize m_dc.
(diagnostic_manager::~diagnostic_manager): Call diagnostic_finish.
(diagnostic_manager::get_file_cache): Drop.
(diagnostic_manager::get_dc): New accessor.
(diagnostic_manager::begin_group): Reimplement.
(diagnostic_manager::end_group): Reimplement.
(diagnostic_manager::get_prev_diag_logical_loc): New accessor.
(diagnostic_manager::m_dc): New field.
(diagnostic_manager::m_file_cache): Drop field.
(diagnostic_manager::m_edit_context): Convert to a std::unique_ptr
so that object can be constructed after m_dc is initialized.
(diagnostic_manager::m_prev_diag_logical_loc): New field.
(diagnostic_text_sink::diagnostic_text_sink): Reimplement.
(get_color_rule): Delete.
(diagnostic_text_sink::set_colorize): Reimplement.
(diagnostic_text_sink::text_starter): New.
(sarif_sink::sarif_sink): Reimplement.
(diagnostic_manager::write_patch): Update for change to
m_edit_context.
(diagnostic_manager::emit): Update now that each sink has a
corresponding diagnostic_output_format object within m_dc.

gcc/fortran/ChangeLog:
PR sarif-replay/117943
* error.cc (gfc_diagnostic_text_starter): Use source-printing
options from text_output.

gcc/testsuite/ChangeLog:
PR sarif-replay/117943
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.cc
(custom_diagnostic_text_finalizer): Use source-printing options
from text_output.
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc
(xhtml_builder::make_element_for_diagnostic): Use source-printing
options from diagnostic_context.
* gcc.dg/plugin/expensive_selftests_plugin.cc (test_richloc):
Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: implement file_cache::dump

This is purely for use when debugging.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::dump): Dump m_file_cache.
* input.cc (file_cache_slot::dump): New decls and implementations.
(file_cache::dump): New.
* input.h (file_cache::dump): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: Require int32plus target for gcc.dg/pr117816.c

Memmove destination overflows if size of int is less than 3, resulting in
spurious test failures. Fix by adding a requirement for effective
target int32plus.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117816.c: Require effective target int32plus.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

docs: Fix [us]abd pattern name.

The uabd and sabd optab name is missing a 3 suffix (for its three
arguments). This patch adds it.

gcc/ChangeLog:

* doc/md.texi: Add "3" suffix.

vect: Do not try to duplicate_and_interleave one-element mode.

PR112694 shows that we try to create sub-vectors of single-element
vectors because can_duplicate_and_interleave_p returns true.
The problem resurfaced in PR116611.

This patch makes can_duplicate_and_interleave_p return false
if count / nvectors > 0 and removes the corresponding check in the riscv
backend.

This partially gets rid of the FAIL in slp-19a.c.  At least when built
with cost model we don't have LOAD_LANES anymore.  Without cost model,
as in the test suite, we choose a different path and still end up with
LOAD_LANES.

Bootstrapped and regtested on x86 and power10, regtested on
rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.

Regards
Robin

gcc/ChangeLog:

PR target/112694
PR target/116611.

* config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
return.
* tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
false when we cannot create sub-elements.

RISC-V: Fix compress shuffle pattern [PR117383].

This patch makes vcompress use the tail-undisturbed policy by default
and also uses the proper VL.

PR target/117383

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Use TU policy.
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c:
Expect tu.
* gcc.target/riscv/rvv/autovec/pr117383.c: New test.

RISC-V: Increase cost for vec_construct [PR118019].

For a generic vec_construct from scalar elements we need to load each
scalar element and move it over to a vector register.
Right now we only use a cost of 1 per element.

This patch uses register-move cost as well as scalar_to_vec and
multiplies it with the number of elements in the vector instead.

PR target/118019

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_builtin_vectorization_cost):
Increase vec_construct cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118019.c: New test.

libstdc++: Initialize all members of hashtable local iterators

Currently the _M_bucket members are left uninitialized for
default-initialized local iterators, and then copy construction copies
indeterminate values. We should just ensure they're initialized on
construction.

Setting them to zero makes default-initialization consistent with
value-initialization and avoids indeterminate values.

For the _Local_iterator_base<..., false> specialization we preserve the
existing behaviour of setting _M_bucket_count to -1 in the default
constructor, as a sentinel value to indicate there's no hash object
present.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Local_iterator_base): Use
default member-initializers.

libstdc++: Use alias-declarations in bits/hashtable_policy,h

This file is only for C++11 and later, so replace typedefs with
alias-declarations for clarity. Also remove redundant std::
qualification on size_t, ptrdiff_t etc.

We can also remove the result_type, first_argument_type and
second_argument_type typedefs from the range hashers. We don't need
those types to follow the C++98 adaptable function object protocol.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h: Replace typedefs with
alias-declarations. Remove redundant std:: qualification.
(_Mod_range_hashing, _Mask_range_hashing): Remove adaptable
function object typedefs.

libstdc++: Simplify storage of hasher in local iterators

The fix for PR libstdc++/56267 (relating to the lifetime of the hash
object stored in a local iterator) has undefined behaviour, as it relies
on being able to call a member function on an empty object that never
started its lifetime. Although the member function probably doesn't care
about the empty object's state, this is still technically undefined
because there is no object of that type at that address. It's also
possible that the hash object would have a stricter alignment than the
_Hash_code_storage object, so that the reinterpret_cast would produce a
misaligned pointer.

This fix replaces _Local_iterator_base's _Hash_code_storage base-class
with a new class template containing a potentially-overlapping (i.e.
[[no_unique_address]]) union member. This means that we always have
storage of the correct type, and it can be initialized/destroyed when
required. We no longer need a reinterpret_cast that gives us a pointer
that we should not dereference.

It would be nice if we could just use a union containing the _Hash
object as a data member of _Local_iterator_base, but that would be an
ABI change. The _Hash_code_storage that contains the _Hash object is the
first base-class, before the _Node_iterator_base base-class. Making the
union a data member of _Local_iterator_base would make it come after the
_Node_iterator_base base instead of before it, altering the layout.

Since we're changing _Hash_code_storage anyway, we can replace it with a
new class template that stores the _Hash object itself in the union,
rather than a _Hash_code_base that holds the _Hash. This removes an
unnecessary level of indirection in the class hierarchy. This change
requires the effects of _Hash_code_base::_M_bucket_index to be inlined
into the _Local_iterator_base::_M_incr function, but that's easy.

We don't need separate specializations of _Hash_obj_storage for an empty
hash function and a non-empty one. Using [[no_unique_address]] gives us
an empty base-class when possible.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Hash_code_storage): Remove.
(_Hash_obj_storage): New class template. Store the hash
function as a union member instead of using a byte buffer.
(_Local_iterator_base): Use _Hash_obj_storage instead of
_Hash_code_storage, adjust members that construct and destroy
the hash object.
(_Local_iterator_base::_M_incr): Calculate bucket index.

libstdc++: Further simplify _Hashtable inheritance hierarchy

The main change here is using [[no_unique_address]] instead of the Empty
Base-class Optimization. Using the attribute allows us to use data
members instead of base-classes. That simplifies the inheritance
hierarchy, which means less work for the compiler. It also means that
ADL has fewer associated classes and associated namespaces to consider,
further reducing the work the compiler has to do.

Reducing the differences between the _Hashtable_ebo_helper primary
template and the partial specialization means we no longer need to use
member functions to access the stored object, because it's now always a
data member called _M_obj. This means we can also remove a number of
other helper functions that were using those member functions to access
the object, for example we can swap the _Hash and _Equal objects
directly in _Hashtable::swap instead of calling _Hashtable_base::_M_swap
which then calls _Hash_code_base::_M_swap.

Although [[no_unique_address]] would allow us to reduce the size for
empty types that are also 'final', doing so would be an ABI break
because those types were previously excluded from using the EBO. So we
still need the _Hashtable_ebo_helper class template and a partial
specialization, so that we only use the attribute under exactly the same
conditions as we previously used the EBO. This could be avoided with a
non-standard [[no_unique_address(expr)]] attribute that took a boolean
condition, or with reflection and token sequence injection, but we don't
have either of those things.

Because _Hashtable_ebo_helper is no longer used as a base-class we don't
need to disambiguate possible identical bases, so it doesn't need an
integral non-type template parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::swap): Swap hash
function and equality predicate here. Inline allocator swap
instead of using __alloc_on_swap.
* include/bits/hashtable_policy.h (_Hashtable_ebo_helper):
Replace EBO with no_unique_address attribute. Remove NTTP.
(_Hash_code_base): Replace base class with data member using
no_unique_address attribute.
(_Hash_code_base::_M_swap): Remove.
(_Hash_code_base::_M_hash): Remove.
(_Hashtable_base): Replace base class with data member using
no_unique_address attribute.
(_Hashtable_base::_M_swap): Remove.
(_Hashtable_alloc): Replace base class with data member using
no_unique_address attribute.

libstdc++: Fix fancy pointer support in linked lists [PR57272]

The union members I used in the new _Node types for fancy pointers only
work for value types that are trivially default constructible. This
change replaces the anonymous union with a named union so it can be
given a default constructor and destructor, to leave the variant member
uninitialized.

This also fixes the incorrect macro names in the alloc_ptr_ignored.cc
tests as pointed out by François, and fixes some std::list pointer
confusions that the fixed alloc_ptr_ignored.cc test revealed.

libstdc++-v3/ChangeLog:

PR libstdc++/57272
* include/bits/forward_list.h (__fwd_list::_Node): Add
user-provided special member functions to union.
* include/bits/stl_list.h (__list::_Node): Likewise.
(_Node_base::_M_hook, _Node_base::swap): Use _M_base() instead
of std::pointer_traits::pointer_to.
(_Node_base::_M_transfer): Likewise. Add noexcept.
(_List_base::_M_put_node): Use 'if constexpr' to avoid using
pointer_traits::pointer_to when not necessary.
(_List_base::_M_destroy_node): Fix parameter to be the pointer
type used internally, not the allocator's pointer.
(list::_M_create_node): Likewise.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
Check explicit instantiation of non-trivial value type.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
Likewise.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
Fix macro name.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
Likewise.

Fix non-aligned CodeView symbols

CodeView symbols in PDB files are aligned to four-byte boundaries. It's
not really clear what logic MSVC uses to enforce this; sometimes the
symbols are padded in the object file, sometimes the linker seems to do
the work.

It makes more sense to do this in the compiler, so fix the two instances
where we can write symbols with a non-aligned length. S_FRAMEPROC is
unusually not a multiple of 4, so will always have 2 bytes padding.
S_INLINESITE is followed by variable-length "binary annotations", so
will also usually have padding.

gcc/
* dwarf2codeview.cc (write_s_frameproc): Align output.
(write_s_inlinesite): Align output.

Daily bump.

hppa: Implement TARGET_FRAME_POINTER_REQUIRED

If a function receives nonlocal gotos, it needs to save the frame
pointer in the argument save area. This ensures that LRA sets
frame_pointer_needed when it saves arguments in the save area.

2024-12-15 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/118018
* config/pa/pa.cc (pa_frame_pointer_required): Declare and
implement.
(TARGET_FRAME_POINTER_REQUIRED): Define.