git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Fix std::variant to reject array types [PR116381]

libstdc++-v3/ChangeLog:

PR libstdc++/116381
* include/std/variant (variant): Fix conditions for
static_assert to match the spec.
* testsuite/20_util/variant/types_neg.cc: New test.

c++, coroutines: Check for malformed functions before splitting.

This performs the same basic check that is done by finish_function
to catch cases where the function is so badly malformed that we
do not have a consistent binding level.

gcc/cp/ChangeLog:

* coroutines.cc (split_coroutine_body_from_ramp): Check
that the binding level is as expected before attempting
to outline the function body.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

testsuite: i386: Fix g++.target/i386/pr116275-2.C on Solaris/x86

The new g++.target/i386/pr116275-2.C test FAILs on 32-bit Solaris/x86:

FAIL: g++.target/i386/pr116275-2.C scan-assembler vpslld

This happens because Solaris defaults to -mstackrealign, disabling -mstv.

Fixed by disabling the former and enabling the latter.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2024-08-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* g++.target/i386/pr116275-2.C (dg-options): Add -mstv
-mno-stackrealign.

Fortran: Fix ICE in sizeof(coarray) [PR77518]

Use se's class_container where present in sizeof().

PR fortran/77518

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_sizeof): Use
class_container of se when set.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/sizeof_1.f90: New test.

rs6000: Remove "+" constraint modifier from *vsx_le_perm_store_* insns

Since *vsx_le_perm_store_* can be split into vector
permute and vector store, after reload_completed, we reuse
the operand 1 as the destination of vector permute, so we
set operand 1 with constraint modifier "+".  But since
it's taken as pure input in DF and most passes as Richard
pointed out in [1], to ensure it's correct when operand 1
is still live, we actually restore the operand 1's value
after the store with vector permute, that is:
  op1 = vector permute op1 (doubleword swapping)
  op0 = op2
  op1 = vector permute op1 (doubleword swapping)
, it means op1's value isn't changed by this insn.

So according to the comments from Richard and Segher in
that thread, this patch is to remove the "+" constraint
modifier of operand 1 from *vsx_le_perm_store_* insns.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660145.html

gcc/ChangeLog:

* config/rs6000/vsx.md (define_insn *vsx_le_perm_store_{<VSX_D:mode>,
<VSX_W:mode>,v8hi,v16qi,<VSX_LE_128:mode>}): Remove constraint modifier
"+" from operand 1.

rs6000: Fix vsx_le_perm_store_* splitters for !reload_completed

For vsx_le_perm_store_* we have two splitters, one is for
!reload_completed and the other is for reload_completed.
As Richard pointed out in [1], operand 1 here is a pure
input for DF and most passes, but it could be used as the
vector rotation (64 bit) destination of itself, so we
re-compute the source (back to the original value) for
the case reload_completed, while for !reload_completed we
generate one new pseudo, so both cases are fine if operand
1 is still live after this insn. But according to the
source code, for !reload_completed case, it can logically
reuse the operand 1 as the new pseudo generation is
conditional on can_create_pseudo_p, then it can cause
wrong result once operand 1 is live. So considering this
and there is no splitting for this when reload_in_progress,
this patch is to fix the code to assert can_create_pseudo_p
there, so that both !reload_completed and reload_completed
cases would ensure operand 1 is unchanged (pure input), it
is also prepared for the following up patch which would
strip the unnecessary INOUT constraint modifier "+".

This also fixes an oversight in the splitter for VSX_LE_128
(!reload_completed), it should use operand 1 rather than
operand 0.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660145.html

gcc/ChangeLog:

* config/rs6000/vsx.md (*vsx_le_perm_store_{<VSX_D:mode>,<VSX_W:mode>,
v8hi,v16qi,<VSX_LE_128:mode>} !reload_completed splitters): Assert
can_create_pseudo_p and always generate one new pseudo for operand 1.

testsuite, rs6000: Remove all powerpc-*paired* uses

Similar to r15-710-g458b23bc8b3e2b which removed all uses of
powerpc-*-linux*paired*, this patch is to remove the remaining
powerpc-*paired* uses which I missed to catch with "*linux*"
in search keyword.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_vect_support_and_set_flags): Remove
the if arm checking powerpc-*paired*.
(check_750cl_hw_available): Remove.
(check_effective_target_vect_unpack): Remove the check on
powerpc-*paired*.

Align predicates for operands[1] between mov<mode> and *mov<mode>_internal.

> It's not obvious to me why movv16qi requires a nonimmediate_operand
> > source, especially since ix86_expand_vector_mode does have code to
> > cope with constant operand[1]s. emit_move_insn_1 doesn't check the
> > predicates anyway, so the predicate will have little effect.
> >
> > A workaround would be to check legitimate_constant_p instead of the
> > predicate, but I'm not sure that that should be necessary.
> >
> > Has this already been discussed? If not, we should loop in the x86
> > maintainers (but I didn't do that here in case it would be a repeat).
>
> I also noticed it. Not sure why movv16qi requires a
> nonimmediate_operand, while ix86_expand_vector_mode could deal with
> constant op. Looking forward to Hongtao's comments.
The code has been there since 2005 before I'm involved.
It looks to me at the beginning both mov<mode> and
*mov<mode>_internal only support nonimmediate_operand for the
operands[1].
And r0-75606-g5656a184e83983 adjusted the nonimmediate_operand to
nonimmediate_or_sse_const_operand for *mov<mode>_internal, but not for
mov<mode>. I think we can align the predicate between mov<mode>
and *mov<mode>_internal.

gcc/ChangeLog:

* config/i386/sse.md (mov<mode>): Align predicates for
operands[1] between mov<mode> and *mov<mode>_internal.
* config/i386/mmx.md (mov<mode>): Ditto.

Daily bump.

builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it

On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
if there was an optab for the type, so this changes that to check the optab to see if we should expand
or have the backend handle it.

Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* builtins.cc (fold_builtin_bit_query): Don't expand double
`unsigned long long` typess if there is an optab entry for that
type.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

Sometimes initialize_sanitizer_builtins is not called before emitting
the asan builtins with hwasan. In the case of the bug report, there
was a path with the fortran front-end where it was not called.
So let's call it in asan_instrument before calling transform_statements
and from hwasan_finish_file.

Built and tested for aarch64-linux-gnu with no regressions.

Changes since v1:
* v2: Add call of asan_instrument to hwasan_finish_file also.

gcc/ChangeLog:

PR sanitizer/115205
* asan.cc (asan_instrument): Call initialize_sanitizer_builtins
for hwasan.
(hwasan_finish_file): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: Fix one typo in .SAT_TRUNC test func name [NFC]

Fix one typo `sat_truc` to `sat_trunc`, as well as `SAT_TRUC` to `SAT_TRUNC`.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Fix SAT_TRUNC typo.
* gcc.target/riscv/sat_u_trunc-1.c: Ditto.
* gcc.target/riscv/sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/sat_u_trunc-9.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-1.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-13.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-14.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-15.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-3.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-4.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-5.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-6.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-7.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-8.c: Ditto.
* gcc.target/riscv/sat_u_trunc-run-9.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

c++/modules: Remove unnecessary errors when not writing compiled module

It was pointed out to me that the current error referencing an internal
linkage entity reads almost like an ICE message, with the message
finishing with the unhelpful:

m.cpp:1:8: error: failed to write compiled module: Bad file data
    1 | export module M;
      |        ^~~~~~

Similarly, whenever we decide not to emit a module CMI due to other
errors we currently emit the following message:

m.cpp:1:8: warning: not writing module ‘M’ due to errors
    1 | export module M;
      |        ^~~~~~

Neither of these messages really add anything useful; users already
understand that when an error is reported then the normal outputs will
not be created, so these messages are just noise.

There is one case we still need this latter message, however; when an
error in a template has been silenced with '-Wno-template-body' we still
don't want to write a module CMI, so emit an error now instead.

This patch also removes a number of dg-prune-output directives in the
testsuite that are no longer needed with this change.

gcc/cp/ChangeLog:

* module.cc (module_state::write_begin): Return a boolean to
indicate errors rather than just doing set_error().
(finish_module_processing): Prevent emission of unnecessary
errors; only indicate module writing occurred if write_begin
succeeds.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-1.C: Remove message.
* g++.dg/modules/internal-1.C: Remove message.
* g++.dg/modules/ambig-2_b.C: Remove unnecessary pruning.
* g++.dg/modules/atom-decl-2.C: Likewise.
* g++.dg/modules/atom-pragma-3.C: Likewise.
* g++.dg/modules/atom-preamble-2_f.C: Likewise.
* g++.dg/modules/block-decl-2.C: Likewise.
* g++.dg/modules/dir-only-4.C: Likewise.
* g++.dg/modules/enum-12.C: Likewise.
* g++.dg/modules/exp-xlate-1_b.C: Likewise.
* g++.dg/modules/export-3.C: Likewise.
* g++.dg/modules/friend-3.C: Likewise.
* g++.dg/modules/friend-5_b.C: Likewise.
* g++.dg/modules/inc-xlate-1_e.C: Likewise.
* g++.dg/modules/linkage-2.C: Likewise.
* g++.dg/modules/local-extern-1.C: Likewise.
* g++.dg/modules/main-1.C: Likewise.
* g++.dg/modules/map-2.C: Likewise.
* g++.dg/modules/mod-decl-1.C: Likewise.
* g++.dg/modules/mod-decl-3.C: Likewise.
* g++.dg/modules/pr99174.H: Likewise.
* g++.dg/modules/pr99468.H: Likewise.
* g++.dg/modules/token-1.C: Likewise.
* g++.dg/modules/token-3.C: Likewise.
* g++.dg/modules/token-4.C: Likewise.
* g++.dg/modules/token-5.C: Likewise.
* g++.dg/modules/using-10.C: Likewise.
* g++.dg/modules/using-12.C: Likewise.
* g++.dg/modules/using-3.C: Likewise.
* g++.dg/modules/using-9.C: Likewise.
* g++.dg/modules/using-enum-2.C: Likewise.
* g++.dg/modules/permissive-error-1.C: New test.
* g++.dg/modules/permissive-error-2.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

match: Reject non-ssa name/min invariants in gimple_extract [PR116412]

After the conversion for phiopt's conditional operand
to use maybe_push_res_to_seq, it was found that gimple_extract
will extract out from REALPART_EXPR/IMAGPART_EXPR/VCE and BIT_FIELD_REF,
a memory load. But that extraction was not needed as memory loads are not
simplified in match and simplify. So gimple_extract should return false
in those cases.

Changes since v1:
* Move the rejection to gimple_extract from factor_out_conditional_operation.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116412

gcc/ChangeLog:

* gimple-match-exports.cc (gimple_extract): Return false if op0
was not a SSA name nor a min invariant for REALPART_EXPR/IMAGPART_EXPR/VCE
and BIT_FIELD_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116412-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: Remove redundant reclaration of std::optional

We've already declared optional at the top of the header, so don't need
to do it again.

libstdc++-v3/ChangeLog:

* include/std/optional: Remove redundant redeclaration.

libstdc++: Fix indentation of lines that follow a [[likely]] attribute

libstdc++-v3/ChangeLog:

* include/std/text_encoding: Fix indentation.

libstdc++: Adjust testcase for constexpr placement new [PR115744]

This test now fails in C++26 mode because the declaration in <new> is
constexpr and the one in the test isn't. Add constexpr to the test.

libstdc++-v3/ChangeLog:

PR libstdc++/115744
* testsuite/18_support/headers/new/synopsis.cc [C++26]: Add
constexpr to placement operator new and operator new[].

libcpp: Adjust lang_defaults

The table over the years turned to be very wide, 147 columns
and any addition would add a couple of new ones.
We need a 28x23 bit matrix right now.

This patch changes the formatting, so that we need just 2 columns
per new feature and so we have some room for expansion.
In addition, the patch changes it to bitfields, which reduces
.rodata by 532 bytes (so 5.75x reduction of the variable) and
on x86_64-linux grows the cpp_set_lang function by 26 bytes (8.4%
growth).

2024-08-20 Jakub Jelinek <jakub@redhat.com>

* init.cc (struct lang_flags): Change all members from char
typed fields to unsigned bit-fields.
(lang_defaults): Change formatting of the initializer so that it
fits to 68 columns rather than 147.

phi-opt: Fix for failing maybe_push_res_to_seq in factor_out_conditional_operation [PR 116409]

The code was assuming that maybe_push_res_to_seq would not fail if the gimple_extract_op returned true.
But for some cases when the function is pure rather than const, then it can fail.
This change moves around the code to check the result of maybe_push_res_to_seq instead of assuming it will
always work.

Changes since v1:
* v2: Instead of directly testing non-pure builtin functions change to test if maybe_push_res_to_seq fails.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/116409

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move
maybe_push_res_to_seq before creating the phi node and the debug dump.
Return false if maybe_push_res_to_seq fails.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116409-1.c: New test.
* gcc.dg/torture/pr116409-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: Appertain standard attributes after array closing square bracket to array type rather than declarator [PR110345]

For C++ 26 P2552R3 I went through all the spots (except modules) where
attribute-specifier-seq appears in the grammar and tried to construct
a testcase in all those spots, for now for [[deprecated]] attribute.

This is the second issue I found. The comment already correctly says that
attributes after closing ] appertain to the array type, but we were
appending them to returned_attrs, so effectively applying them to the
declarator (as if they appeared right after declarator-id).

2024-08-20 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* decl.cc (grokdeclarator): Apply declarator->std_attributes
for cdk_array to type, rather than chaining it to returned_attrs.

* g++.dg/cpp0x/gen-attrs-82.C: New test.
* g++.dg/gomp/attrs-3.C (foo): Expect different diagnostics for
omp::directive attribute after closing square bracket of an automatic
declaration and add a test with the attribute after array's
declarator-id.

c++: Parse and ignore attributes on base specifiers [PR110345]

For C++ 26 P2552R3 I went through all the spots (except modules) where
attribute-specifier-seq appears in the grammar and tried to construct
a testcase in all those spots, for now for [[deprecated]] attribute.

This is the third issue I found.

https://eel.is/c++draft/class.derived#general-1 has attribute-specifier-seq
at the start of base-specifier. The following patch parses it there and
warns about those.

2024-08-20 Jakub Jelinek <jakub@redhat.com>

PR c++/110345
* parser.cc (cp_parser_base_specifier): Parse standard attributes
at the start and emit a warning if there are any non-ignored ones.

* g++.dg/cpp0x/gen-attrs-83.C: New test.

RISC-V: Remove testcase XFAIL

The testcase has been modified to include the -fwrapv flag which now
causes the test to pass. Remove the xfail exception

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Remove riscv xfail exception

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

c++: Improve errors parsing a braced list [PR101232]

PR c++/101232

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Commit to the
parse in case we know its either a cast or invalid syntax.
(cp_parser_braced_list): Add a heuristic to inform about
missing comma or operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-err1.C: New test.
* g++.dg/cpp0x/initlist-err2.C: New test.
* g++.dg/cpp0x/initlist-err3.C: New test.

Signed-off-by: Franciszek Witt <franek.witt@gmail.com>

doc: Normalize reference to binutils version for C6X

We generally do not use a hyphen between project name and version.

gcc:
* doc/install.texi (Specific) <c6x-*-*>: Normalize reference to
binutils.

Match: Add pattern for `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c` [PR103660]

This adds a pattern to convert `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c`
which is simplier. It adds both for cond and vec_cond; even though vec_cond is
handled via a different pattern currently but requires extra steps for matching
so this should be slightly faster.

Also handle it for xor and plus too since those can be handled the same way.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/103660

gcc/ChangeLog:

* match.pd (`(a ? b : 0) | (a ? 0 : c)`): New pattern.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-4.C: New test.
* gcc.dg/tree-ssa/pr103660-4.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

match: extend the `((a CMP b) ? c : 0) | ((a CMP' b) ? d : 0)` patterns to support ^ and + [PR103660]

r13-4620-g4d9db4bdd458 Added a few patterns and some of them can be extended to support XOR and PLUS.
This extends the patterns to support XOR and PLUS instead of just IOR.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/103660

gcc/ChangeLog:

* match.pd (`((a CMP b) ? c : 0) | ((a CMP' b) ? d : 0)`): Extend to support
XOR and PLUS.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-2.C: New test.
* g++.dg/tree-ssa/pr103660-3.C: New test.
* gcc.dg/tree-ssa/pr103660-2.c: New test.
* gcc.dg/tree-ssa/pr103660-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Add testcases for part of PR 103660

IOR part of the bug report was fixed by r13-4620-g4d9db4bdd458 but
that added only aarch64 specific testcases. This adds 4
generic testcases for this to check to make sure they are optimized.
The C++ testcases are the vector type versions.

PR tree-optimization/103660

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-0.C: New test.
* g++.dg/tree-ssa/pr103660-1.C: New test.
* gcc.dg/tree-ssa/pr103660-0.c: New test.
* gcc.dg/tree-ssa/pr103660-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: default targ eligibility refinement [PR101463]

On Tue, 9 Jan 2024, Jason Merrill wrote:
> On 1/5/24 15:01, Patrick Palka wrote[1]:
> > Here during default template argument substitution we wrongly consider
> > the (substituted) default arguments v and vt<int> as value-dependent[1]
> > which ultimately leads to deduction failure for the calls.
> >
> > The bogus value_dependent_expression_p result aside, I noticed
> > type_unification_real during default targ substitution keeps track of
> > whether all previous targs are known and non-dependent, as is the case
> > for these calls. And in such cases it should be safe to avoid checking
> > dependence of the substituted default targ and just assume it's not.
> > This patch implements this optimization, which lets us accept both
> > testcases by sidestepping the value_dependent_expression_p issue
> > altogether.
>
> Hmm, maybe instead of substituting and asking if it's dependent, we should
> specifically look for undeduced parameters.

This patch implements this refinement, which incidentally fixes PR101463
just as well.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641957.html

PR c++/101463

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Directly look for undeduced
parameters in the default argument instead of doing a trial
substitution.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype6.C: New test.
* g++.dg/cpp1z/nontype6a.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libcpp: replace SSE4.2 helper with an SSSE3 one

Since the characters we are searching for (CR, LF, '\', '?') all have
distinct ASCII codes mod 16, PSHUFB can help match them all at once.

Directly use the new helper if __SSSE3__ is defined. It makes the other
helpers unused, so mark them inline to prevent warnings.

Rewrite and simplify init_vectorized_lexer.

libcpp/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for SSSE3 instead of SSE4.2.
* files.cc (read_file_guts): Bump padding to 64 if HAVE_SSSE3.
* lex.cc (search_line_acc_char): Mark inline, not "unused".
(search_line_sse2): Mark inline.
(search_line_sse42): Replace with...
(search_line_ssse3): ... this new function. Adjust the use...
(init_vectorized_lexer): ... here. Simplify.

tree-optimization/116274 - overzealous SLP vectorization

The following tries to address that the vectorizer fails to have
precise knowledge of argument and return calling conventions and
views some accesses as loads and stores that are not.
This is mainly important when doing basic-block vectorization as
otherwise loop indexing would force such arguments to memory.

On x86 the reduction in the number of apparent loads and stores
often dominates cost analysis so the following tries to mitigate
this aggressively by adjusting only the scalar load and store
cost, reducing them to the cost of a simple scalar statement,
but not touching the vector access cost which would be much
harder to estimate. Thereby we error on the side of not performing
basic-block vectorization.

PR tree-optimization/116274
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
and stores as simple scalar stmts when they access a non-global,
not address-taken variable that doesn't have BLKmode assigned.

* gcc.target/i386/pr116274-2.c: New testcase.

Fortran: Fix [Coarray] ICE in conv_caf_send, at fortran/trans-intrinsic.c:1950 [PR84246]

Fix ICE caused by converted expression already being pointer by checking
for its type. Lift rewrite to caf_send completely into resolve and
prevent more temporary arrays.

PR fortran/84246

gcc/fortran/ChangeLog:

* resolve.cc (caf_possible_reallocate): Detect arrays that may
be reallocated by caf_send.
(resolve_ordinary_assign): More reliably detect assignments
where a rewrite to caf_send is needed.
* trans-expr.cc (gfc_trans_assignment_1): Remove rewrite to
caf_send, because this is done by resolve now.
* trans-intrinsic.cc (conv_caf_send): Prevent unneeded temporary
arrays.

libgfortran/ChangeLog:

* caf/single.c (send_by_ref): Created array's lbound is now 1
and the offset set correctly.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_allocate_7.f08: Adapt to array being
allocate by caf_send.

[optc-save-gen.awk] Fix streaming of command line options for offloading.

The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p)
check before streaming out target-specific opt in cl_optimization_stream_out,
when offloading is enabled.

Also, it modifies cl_optimization_stream_in to issue an error during build time
if accelerator backend defines a target-specific Optimization option. This
restriction currently is in place to maintain consistency for streaming of
Optimization options between host and accelerator. A proper fix would be
to merge target-specific Optimization options for host and accelerators
enabled for offloading.

gcc/ChangeLog:
* optc-save-gen.awk: New array var_target_opt. Use it to generate
if (!lto_stream_offload_p) check in cl_optimization_stream_out,
and generate a diagnostic with #error if accelerator backend uses
Optimization for target-specifc options in cl_optimization_stream_in.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

c++/modules: Disable streaming definitions of non-vague-linkage GMF decls [PR115020]

The error in the linked PR is caused because 'DECL_THIS_STATIC' is true
for the static member function, causing the streaming code to assume
that this is an internal linkage GM entity that needs to be explicitly
streamed, which then on read-in gets marked as a vague linkage function
(despite being non-inline) causing import_export_decl to complain.

However, I don't see any reason why we should care about this:
definitions in the GMF should just be emitted as per usual regardless of
whether they're internal-linkage or not. Actually the only thing we
care about here are header modules, since they have no TU to write
definitions into. As such this patch removes these conditions from
'has_definition' and updates some comments to clarify.

PR c++/115020

gcc/cp/ChangeLog:

* module.cc (has_definition): Only force writing definitions for
header_module_p.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr115020_a.C: New test.
* g++.dg/modules/pr115020_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Handle transitive reachability for deduction guides [PR116403]

Currently we implement [temp.deduct.guide] p1 by forcing all deduction
guides to be considered as exported. However this is not sufficient:
for transitive non-exported imports we will still hide the deduction
guide from name lookup, causing errors.

This patch instead adjusts name lookup to have a new ANY_REACHABLE flag
to allow for this case. Currently this is only used by deduction guides
but there are some other circumstances where this may be useful in the
future (e.g. finding existing temploid friends).

PR c++/116403

gcc/cp/ChangeLog:

* pt.cc (deduction_guides_for): Use ANY_REACHABLE for lookup of
deduction guides.
* module.cc (depset::hash::add_deduction_guides): Likewise.
(module_state::write_cluster): No longer override deduction
guides as exported.
* name-lookup.cc (name_lookup::search_namespace_only): Ignore
visibility when LOOK_want::ANY_REACHABLE is specified.
(check_module_override): Ignore visibility when checking for
ambiguating deduction guides.
* name-lookup.h (LOOK_want): New flag 'ANY_REACHABLE'.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dguide-4_a.C: New test.
* g++.dg/modules/dguide-4_b.C: New test.
* g++.dg/modules/dguide-4_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Avoid rechecking initializers when streaming NTTPs [PR116382]

When reading an NTTP we call get_template_parm_object which delegates
setting of DECL_INITIAL to the general cp_finish_decl procedure, which
calls check_initializer to validate and record it.

Apart from being unnecessary (it must have already been validated by the
writing module), this also causes errors in cases like the linked PR, as
validating may end up needing to call lazy_load_pendings to determine
any specialisations that may exist which violates assumptions of the
modules streaming code.

This patch works around the issue by adding a flag to
get_template_parm_object to disable these checks when not needed.

PR c++/116382

gcc/cp/ChangeLog:

* cp-tree.h (get_template_parm_object): Add check_init param.
* module.cc (trees_in::tree_node): Pass check_init=false when
building NTTPs.
* pt.cc (get_template_parm_object): Prevent cp_finish_decl from
validating the initializer when check_init=false.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-nttp-1_a.C: New test.
* g++.dg/modules/tpl-nttp-1_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Fix type lookup in DECL_TEMPLATE_INSTANTIATIONS [PR116364]

We need to use the DECL_TEMPLATE_INSTANTIATIONS property to find
reachable specialisations from a template to ensure that any GM
specialisations are properly marked as reachable.

Currently the modules code uses the decl when rebuilding this property,
but this is not always correct; it appears that for type specialisations
we need to use the TREE_TYPE of the decl instead so that the
specialisation is correctly found. This patch makes the required
adjustments.

PR c++/116364

gcc/cp/ChangeLog:

* cp-tree.h (get_mergeable_specialization_flags): Adjust
signature.
* module.cc (trees_out::decl_value): Indicate whether this is a
type or decl specialisation.
* pt.cc (get_mergeable_specialization_flags): Match against the
type of a non-decl specialisation.
(add_mergeable_specialization): Use the already calculated spec
instead of always adding decl to DECL_TEMPLATE_INSTANTIATIONS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-spec-9_a.C: New test.
* g++.dg/modules/tpl-spec-9_b.C: New test.
* g++.dg/modules/tpl-spec-9_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

m68k: Add -mlra

PR target/113939
* config/m68k/m68k.opt (mlra): New target option.
* config/m68k/m68k.cc (m68k_use_lra_p): New function.
(TARGET_LRA_P): Use it.
* config/m68k/m68k.opt.urls: Regenerate.

c++: ICE with enum and conversion fn in template [PR115657]

Here we initialize an enumerator with a class prvalue with a conversion
function. When we fold it in build_enumerator, we create a TARGET_EXPR
for the object, and subsequently crash in tsubst_expr, which should not
see such a code.

Normally, we fix similar problems by using an IMPLICIT_CONV_EXPR but here
I may get away with not using the result of fold_non_dependent_expr unless
the result is a constant. A TARGET_EXPR is not constant.

PR c++/115657

gcc/cp/ChangeLog:

* decl.cc (build_enumerator): Call maybe_fold_non_dependent_expr
instead of fold_non_dependent_expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-recursion2.C: New test.
* g++.dg/template/conv21.C: New test.

c++: fix ICE in convert_nontype_argument [PR116384]

Here we ICE since r14-8291 in C++11/C++14 modes.  Fortunately
this is an easy one.

The important bit of r14-8291 is this:

@@ -20056,9 +20071,12 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
        RETURN (retval);
      }
    if (IMPLICIT_CONV_EXPR_NONTYPE_ARG (t))
-     /* We'll pass this to convert_nontype_argument again, we don't need
-        to actually perform any conversion here.  */
-     RETURN (expr);
+     {
+       tree r = convert_nontype_argument (type, expr, complain);
+       if (r == NULL_TREE)
+         r = error_mark_node;
+       RETURN (r);
+     }

which obviously means that instead of returning right away we go
to convert_nontype_argument.  When type is error_mark_node and we're
in C++17, in convert_nontype_argument we go down this path:

      else if (INTEGRAL_OR_ENUMERATION_TYPE_P (type)
               || cxx_dialect >= cxx17)
        {
          expr = build_converted_constant_expr (type, expr, complain);
          if (expr == error_mark_node)
            return (complain & tf_error) ? NULL_TREE : error_mark_node;
  // ...
}

but pre-C++17, we take a different route and end up crashing on
gcc_unreachable.

It would of course also work to check for error_mark_node early in
build_converted_constant_expr.

PR c++/116384

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case IMPLICIT_CONV_EXPR>: Bail if tsubst
returns error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/vt-116384.C: New test.

aarch64: Fix ls64 intrinsic availability

The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).

The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
feature check at initialisation.
(aarch64_general_check_builtin_call): Check ls64 intrinsics.
* config/aarch64/arm_acle.h: (data512_t) Make always available.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
* gcc.target/aarch64/acle/ls64_guard-4.c: New test.

aarch64: Fix memtag intrinsic availability

The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_memtag_builtins):
Define intrinsic names directly.
(aarch64_general_init_builtins): Move memtag intialisation...
(handle_arm_acle_h): ...to here, and remove feature check.
(aarch64_general_check_builtin_call): Check memtag intrinsics.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag)
(__arm_mte_exclude_tag, __arm_mte_ptrdiff)
(__arm_mte_increment_tag, __arm_mte_set_tag, __arm_mte_get_tag):
Remove.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
* gcc.target/aarch64/acle/memtag_guard-4.c: New test.

aarch64: Fix tme intrinsic availability

The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options). This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
Define intrinsic names directly.
(aarch64_general_init_builtins): Move tme initialisation...
(handle_arm_acle_h): ...to here, and remove feature check.
(aarch64_general_check_builtin_call): Check tme intrinsics.
* config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
(__ttest): Remove.
(_TMFAILURE_*): Define unconditionally.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/tme_guard-1.c: New test.
* gcc.target/aarch64/acle/tme_guard-2.c: New test.
* gcc.target/aarch64/acle/tme_guard-3.c: New test.
* gcc.target/aarch64/acle/tme_guard-4.c: New test.

aarch64: Move check_required_extensions

Move SVE extension checking functionality to aarch64-builtins.cc, so
that it can be shared by non-SVE intrinsics.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
(expand_builtin): Update calls to the below.
(report_missing_extension, report_missing_registers)
(check_required_extensions): Move out of aarch64_sve namespace,
rename, and move into...
* config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
(aarch64_report_missing_registers)
(aarch64_check_required_extensions) ...here.
* config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
Add prototype.

aarch64: Refactor check_required_extensions

Replace TARGET_GENERAL_REGS_ONLY check with an explicit check that
aarch64_isa_flags enables all required extensions. This will be more
flexible when repurposing this function for non-SVE intrinsics.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc
(check_required_registers): Remove target check and rename to...
(report_missing_registers): ...this.
(check_required_extensions): Refactor.

Allow coarrays in select type. [PR46371, PR56496]

Fix ICE when scalar coarrays are used in a select type. Prevent
coindexing in associate/select type/select rank selector expression.

gcc/fortran/ChangeLog:

PR fortran/46371
PR fortran/56496

* expr.cc (gfc_is_coindexed): Detect is coindexed also when
rewritten to caf_get.
* trans-stmt.cc (trans_associate_var): Always accept a
descriptor for coarrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/select_type_1.f90: New test.
* gfortran.dg/coarray/select_type_2.f90: New test.
* gfortran.dg/coarray/select_type_3.f90: New test.

gnat: fix lto-type-mismatch between C_Version_String and gnat_version_string [PR115917]

gcc/ada/ChangeLog:

PR ada/115917
* gnatvsn.ads: Add note about the duplication of this value in
version.c.
* version.c (VER_LEN_MAX): Define to the same value as
Gnatvsn.Ver_Len_Max.
(gnat_version_string): Use VER_LEN_MAX as bound.

aarch64: Reduce FP reassociation width for Neoverse V2 and set AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA

The fp reassociation width for Neoverse V2 was set to 6 since its
introduction and I guess it was empirically tuned. But since
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA was added the tree reassociation
pass seems to be more deliberate in forming FMAs and when that flag is
used it seems to more properly evaluate the FMA vs non-FMA reassociation
widths.
According to the Neoverse V2 SWOG the core has a throughput of 4 for
most FP operations, so the value 6 is not accurate anyway.
Also, the SWOG does state that FMADD operations are pipelined and the
results can be forwarded from FP multiplies to the accumulation operands
of FMADD instructions, which seems to be what
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA expresses.

This patch sets the fp_reassoc_width field to 4 and enables
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA for -mcpu=neoverse-v2.

On SPEC2017 fprate I see the following changes on a Grace system:
503.bwaves_r 0.16%
507.cactuBSSN_r -0.32%
508.namd_r 3.04%
510.parest_r 0.00%
511.povray_r 0.78%
519.lbm_r 0.35%
521.wrf_r 0.69%
526.blender_r -0.53%
527.cam4_r 0.84%
538.imagick_r 0.00%
544.nab_r -0.97%
549.fotonik3d_r -0.45%
554.roms_r 0.97%
Geomean 0.35%

with -Ofast -mcpu=grace -flto.

So slight overall improvement with a meaningful improvement in
508.namd_r.

I think other tunings in aarch64 should look into
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA as well, but I'll leave the
benchmarking to someone else.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:

* config/aarch64/tuning_models/neoversev2.h (fp_reassoc_width):
Set to 4.
(tune_flags): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.

testsuite: Prune warning about size of enums

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1315.

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr33738-2.C: dg-prune arm linker messages about
size of enums.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

rtl: Enable the use of rtx values with int and mode attributes

The 'code' part of a 'define_code_attr' refers to the type of the key, in other
words, it uses a code_iterator to pick the 'value' from their (key "value") pair
list.

However, rtx_alloc_for_name requires a code_attribute to be used when the
'value' needs to be a type. In other words, no other type of attributes could be
used, before this patch, to produce a rtx typed 'value'.

This patch removes that restriction and allows the backend to use any kind of
attribute as long as that attribute always produces a valid code typed 'value'.

gcc/ChangeLog:

* read-rtl.cc (rtx_reader::rtx_alloc_for_name): Allow all attribute
types to produce code 'values'.
(check_code_attribute): Rename ...
(check_attribute_codes): ... to this. And change comments to refer to
* doc/md.texi: Add paragraph to document that you can use int and mode
attributes to produce codes.

testsuite: Reduce cut-&-paste in scanltranstree.exp

scanltranstree.exp defines some LTO wrappers around standard
non-LTO scanners.  Four of them are cut-&-paste variants of
one another, so this patch generates them from a single template.
It also does the same for scan-ltrans-tree-dump-times, so that
other *-times scanners can be added easily in future.

The scanners seem to be lightly used.  gcc.dg/ipa/ipa-icf-38.c uses
scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
uses scan-ltrans-tree-dump-{not,times}.  Nothing currently seems
to use scan-ltrans-tree-dump-dem*.

gcc/testsuite/
* lib/scanltranstree.exp: Redefine the routines using two
templates.

Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535 [PR84244]

Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.

PR fortran/84244

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
generated for a component, link it to the component it is
generated for (the previous one).

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/ptr_comp_5.f08: New test.

aarch64: Implement 16-byte vector mode const0 store by TImode

gcc/
* config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD):
Expand 16-byte vector mode const0 store by TImode.

AVX10.2 ymm rounding: Support vsqrtp{s,d,h} and vsubp{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vscalefp{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/sse.md:
(<avx512>_scalef<mode><mask_name><round_name>): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vreducep{s,d,h} and vrndscalep{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
Add condition check.
(<avx512>_rndscale<mode><mask_name><round_saeonly_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vmulp{s,d,h} and vrangep{s,d} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
Handle V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI_INT.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support v{max,min}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vgetexpp{s,d,h} and vgetmantp{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SF_V8SF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_UHI_INT, V16HF_FTYPE_V16HF_INT_V16HF_UHI_INT,
V4DF_FTYPE_V4DF_INT_V4DF_UQI_INT, V8SF_FTYPE_V8SF_INT_V8SF_UQI_INT.
* config/i386/sse.md:
(<avx512>_getexp<mode><mask_name><round_saeonly_name>):
Add condition check.
(<avx512>_getmant<mode><mask_name><round_saeonly_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vfnmsub{132,231,213}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fnmsub_<mode>_mask3<round_name>): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132,231,213}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vfm{sub,subadd}{132,231,213}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmsub_<mode>_mask<round_name>): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132,231,213}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmaddsub_<mode>_mask<round_name>): Add condition check.
(<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.

AVX10.2 ymm rounding: Support vfmadd{132,231,213}p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmadd_<mode>_mask3<round_name>): Add condition check.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.

AVX10.2 ymm rounding: Support vfc{madd,mul}cph, vfixupimmp{s,d} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HF_V16HF_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI_INT,
V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI_INT.
* config/i386/sse.md:
(<avx512>_fixupimm<mode><sd_maskz_name><round_saeonly_name>):
Add condition check.
(<avx512>_fixupimm<mode>_mask<round_saeonly_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.

AVX10.2 ymm rounding: Support vcvt{,u}w2ph and vdivp{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HI_V16HF_UHI_INT.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.

AVX10.2 ymm rounding: Support vcvttps2{,u}{dq,qq} and vcvtu{dq,qq}2p{s,d,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(unspec_fix_truncv8sfv8si2<mask_name>): Extend rounding control.
(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
Ditto.
(<mask_codefor>floatuns<sseintvecmodelower><mode>2<mask_name><round_name>):
Add condition check.
(fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>):
Remove round_saeonly_name.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.

AVX10.2 ymm rounding: Support vcvttph2{,u}{dq,qq,w} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>):
Extend round control for 256bit.
(unspec_avx512fp16_fix<vcvtt_uns_suffix>_trunc<mode>2<mask_name>):
Ditto.
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>):
Add condition check.
* config/i386/subst.md
(round_saeonly_mode_condition): Add V16HI check for 256bit.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.

AVX10.2 ymm rounding: Support vcvtqq2p{s,d,h} and vcvttpd2{,u}{dq,qq} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DI_V4DF_UQI_INT, V4SF_FTYPE_V4DI_V4SF_UQI_INT,
V8HF_FTYPE_V4DI_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvt<floatsuffix>qq2ph_v4di_mask_round): New expand.
(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask):
Extend round control and add "_1" suffix.
(float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>):
Add condition check.
(float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>):
Ditto.
(float<floatunssuffix><mode><ssePSmode2lower>2<mask_name><round_name>):
Limit suffix output.
(unspec_fix_truncv4dfv4si2<mask_name>): Extend round control.
(unspec_fixuns_truncv4dfv4si2<mask_name>): Ditto.
* config/i386/subst.md (round_qq2pssuff): New iterator.
(round_saeonly_suff): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: New test.

AVX10.2 ymm rounding: Support vcvtps2{,u}{dq,qq} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SI_FTYPE_V8SF_V8SI_UQI_INT, V4DI_FTYPE_V4SF_V4DI_UQI_INT.
* config/i386/sse.md
(<sse2_avx_avx512f>_fix_notrunc<sf2simodelower><mode><mask_name>):
Extend to round.
(<mask_codefor><avx512>_fixuns_notrunc<sf2simodelower><mode><mask_name><round_name>):
Add round condition check.
* config/i386/subst.md (round_constraint4): New.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.

AVX10.2 ymm rounding: Support vcvtph2{,u}w and vcvtps2p{d,hx} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HI_FTYPE_V16HF_V16HI_UHI_INT, V4DF_FTYPE_V4SF_V4DF_UQI_INT
V8HF_FTYPE_V8SF_V8HF_UQI_INT.
* config/i386/sse.md
(avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>):
Add round condition check.
* config/i386/subst.md (round_mode_condition): Add V16HI check for
256bit.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.

AVX10.2 ymm rounding: Support vcvtph2p{s,d,sx} and vcvtph2{,u}{dq,qq} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8HF_V8SF_UQI_INT, V8SI_FTYPE_V8HF_V8SI_UQI_INT,
V4DF_FTYPE_V8HF_V4DF_UQI_INT, V4DI_FTYPE_V8HF_V4DI_UQI_INT.
* config/i386/sse.md:
(avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>):
Add condition check.
(avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode>
<mask_name><round_name>):
Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name>): Extend round saeonly.
(vcvtph2ps256<mask_name>): Ditto.
* config/i386/subst.md
(round_saeonly_applied): New condition.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.

AVX10.2 ymm rounding: Support vcvtpd2{,u}{dq,qq} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DI_FTYPE_V4DF_V4DI_UQI_INT, V4SI_FTYPE_V4DF_V4SI_UQI_INT.
* config/i386/sse.md:
(avx_cvtpd2dq256<mask_name>): Change name to
avx_cvtpd2dq256<mask_name><round_name> and extend pattern to
generate 256bit insns.
(fixuns_notrunc<mode><si2dfmodelower>2<mask_name><round_name>):
Add round_mode_condition.
* config/i386/subst.md (round_pd2udqsuff): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.

AVX10.2 ymm rounding: Support vcvtdq2p{s,h} and vcvtpd2p{s,h} intrins

gcc/ChangeLog:

* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SI_V8SF_UQI_INT, V4SF_FTYPE_V4DF_V4SF_UQI_INT,
V8HF_FTYPE_V8SI_V8HF_UQI_INT, V8HF_FTYPE_V4DF_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>):
Add condition check.
(avx512fp16_vcvtpd2ph_v4df_mask_round): New expand.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Change name to
avx512fp16_vcvt<castmode>2ph_<mode>_mask<round_name>_1
and extend pattern to generate 256bit insns.
(avx_cvtpd2ps256<mask_name>): Change name to
avx_cvtpd2ps256<mask_name><round_name> and extend pattern to
generate 256bit insns.
* config/i386/subst.md (round_applied): New condition.
(round_suff): New iterator.
(round_mode_condition): Add V32HI check for 512bit.
(round_saeonly_mode_condition): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.

AVX10.2 ymm rounding: Support vadd{s,d,h} and vcmp{s,d,h} intrins

gcc/ChangeLog:

* config.gcc: Add avx10_2roundingintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT, UQI_FTYPE_V4DF_V4DF_INT_UQI_INT,
UHI_FTYPE_V16HF_V16HF_INT_UHI_INT, UQI_FTYPE_V8SF_V8SF_INT_UQI_INT.
* config/i386/immintrin.h: Include avx10_2roundingintrin.h.
* config/i386/sse.md: Change subst_attr name due to renaming.
* config/i386/subst.md:
(<round_mode512bit_condition>): Add condition check for avx10.2
rounding control 256bit intrins and renamed to ...
(<round_mode_condition>): ...this.
(round_saeonly_mode512bit_condition): Add condition check for
avx10.2 rounding control 256 bit intris and renamed to ...
(round_saeonly_mode_condition): ...this.
* config/i386/avx10_2roundingintrin.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx10.2 and new builtin test.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Add new tests.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: New test.

Daily bump.

[PR rtl-optimization/115876] Avoid ubsan in ext-dce.cc

This fixes two general ubsan issues in ext-dce, both related to use-side
processsing of modes > DImode.

In ext_dce_process_uses we can be presented with something like this as a use
(subreg:SI (reg:TF) 12)

That will result in an out of range shift for a HOST_WIDE_INT object.  Where
this happens is safe to just break from the SET context and process the
subjects.  This will ultimately result in seeing (reg:TF) and we'll mark all
bit groups as live.

In carry_backpropagate we can be presented with a TImode shift (for example)
and the shift count can be > 63 for such a shift.  This naturally trips ubsan
as well as we're operating on 64 bit objects.

We can just return mmask in this case noting that every bit group is live.

The combination of these two fixes eliminates all the reported ubsan issues in
ext-dce seen in a bootstrap and regression test on x86.

While I was in there I went ahead and fixed the various hardcoded 63/64 values
to be HOST_BITS_PER_WIDE_INT based.

Bootstrapped and regression tested on x86 with no regressions.  Also built with
ubsan enabled and verified the build logs and testsuite logs don't call out any
issues in ext-dce anymore.

Pushing to the trunk.

PR rtl-optimization/115876
gcc
* ext-dce.cc (ext_dce_process_sets): Replace hardcoded 63/64 instances
with HOST_BITS_PER_WIDE_INT based values.
(carry_backpropagate): Handle modes with more bits than
HOST_BITS_PER_WIDE_INT gracefully, avoiding undefined behavior.
(ext_dce_process_uses): Handle subreg offsets which would result
in ubsan shifts gracefully, avoiding undefined behavior.

libstdc++: Remove note from the GCC 4.0.1 days

libstdc++-v3:
* doc/xml/manual/prerequisites.xml: Remove note from the
GCC 4.0.1 days.
* doc/html/manual/setup.html: Regenerate.

doc: Tweak gm2 mailing list address

gcc:
* doc/gm2.texi (Contributing): Tweak gm2 mailing list address.

PHIOPT: move factor_out_conditional_operation over to use gimple_match_op

To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.

Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op
instead of manually extracting from/creating the gimple.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr87007-5.c: Disable phi-opt.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic

This allows to build and use IEEE modules on Darwin PowerPC.

libgfortran/ChangeLog:

* config/fpu-macppc.h (new file): initial support for powerpc-darwin.
* configure.host: enable ieee_support for powerpc-darwin case,
set fpu_host='fpu-macppc'.

Signed-off-by: Sergey Fedorov <vital.had@gmail.com>

AVR: Tweak 16-bit addition with const that didn't get a LD_REGS register.

The 16-bit additions like addhi3 have two forms: One with a scratch:QI
and one without, where the latter is required because reload cannot
deal with a scratch when spill code pops a 16-bit addition.

Passes like combine and fwprop1 may come up with the non-scratch version,
which is sub-optimal in the case when the addition is performed in a
NO_LD_REGS register because the operands will be spilled to LD_REGS.
Having a scratch:QI at disposal can lead to better code with less spills.

gcc/
* config/avr/avr.md (*add<mode>3_split) [!reload_completed]:
Add a scratch:QI to 16-bit additions with constant.

AVR: ad target/116407 - Fix linker error "relocation truncated to fit".

PR target/116407
gcc/
* config/avr/avr.md (*dec-and-branchhi!=-1.l.clobber):
Increase the additional jump offset to 2 words.

AVR: target/116407 - Fix linker error "relocation truncated to fit".

Some text peepholes output extra instructions prior to a branch
instruction and that increase the jump offset of backward branches.

PR target/116407
gcc/
* config/avr/avr-protos.h (avr_jump_mode): Add an int argument.
* config/avr/avr.cc (avr_jump_mode): Add an int argument to increase
the computed jump offset of backwards branches.
* config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1):
Increase the jump offset used by avr_jump_mode() as needed.
gcc/testsuite/
* gcc.target/avr/torture/pr116407-2.c: New test.
* gcc.target/avr/torture/pr116407-4.c: New test.

forwprop: Also dce from added statements from gimple_simplify

This extends r14-3982-g9ea74d235c7e78 to also include the newly added statements
since some of them might be dead too (due to the way match and simplify works).
This was noticed while working on adding a new match and simplify pattern where a
new statement that got added was not being used.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-fold.cc (mark_lhs_in_seq_for_dce): New function.
(replace_stmt_with_simplification): Call mark_lhs_in_seq_for_dce
right before inserting the sequence.
(fold_stmt_1): Add dce_worklist argument, update call to
replace_stmt_with_simplification.
(fold_stmt): Add dce_worklist argument, update call to fold_stmt_1.
(fold_stmt_inplace): Update call to fold_stmt_1.
* gimple-fold.h (fold_stmt): Add bitmap argument.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Update call to fold_stmt.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {                                        \
    bool overflow = x > (WT)(NT)(-1);      \
    return ((NT)x) | (NT)-overflow;        \
  }

DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   _Bool overflow;
   8   │   short unsigned int _1;
   9   │   short unsigned int _2;
  10   │   short unsigned int _3;
  11   │   uint16_t _6;
  12   │
  13   │ ;;   basic block 2, loop depth 0
  14   │ ;;    pred:       ENTRY
  15   │   overflow_5 = x_4(D) > 65535;
  16   │   _1 = (short unsigned int) x_4(D);
  17   │   _2 = (short unsigned int) overflow_5;
  18   │   _3 = -_2;
  19   │   _6 = _1 | _3;
  20   │   return _6;
  21   │ ;;    succ:       EXIT
  22   │
  23   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   uint16_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _6;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc<mode><anyi_quad_truncated>2):
Add new pattern for quad truncation.
(ustrunc<mode><anyi_oct_truncated>2): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]

For QI/HImode of .SAT_ADD,  the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected.  For example as
below code.

signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
  b[0] = -40;
  c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9;
  __builtin_printf("%d\n", c);
}

After expanding we have:

;; _6 = .SAT_ADD (_3, 9);
(insn 8 7 9 (set (reg:DI 143)
        (high:DI (symbol_ref:DI ("d") [flags 0x86]  <var_decl d>)))
     (nil))
(insn 9 8 10 (set (reg/f:DI 142)
        (mem/f/c:DI (lo_sum:DI (reg:DI 143)
                (symbol_ref:DI ("d") [flags 0x86]  <var_decl d>)) [1 d+0 S8 A64]))
     (nil))
(insn 10 9 11 (set (reg:HI 144 [ _3 ])
        (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) "test.c":7:10 -1
     (nil))

The convert from signed char to unsigned short will have sign_extend rtl
as above.  And finally become the lb insn as below:

lb      a1,0(a5)   // a1 is -40, aka 0xffffffffffffffd8
lui     a0,0x1a
addi    a5,a1,9
slli    a5,a5,0x30
srli    a5,a5,0x30 // a5 is 65505
sltu    a1,a5,a1   // compare 65505 and 0xffffffffffffffd8 => TRUE

The sltu try to compare 65505 and 0xffffffffffffffd8 here,  but we
actually want to compare 65505 and 65496 (0xffd8).  Thus we need to
clean up the high bits to ensure this.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.

PR target/116278

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands 0
and remove the special handing for SImode in RV64.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_add-11.c: Adjust asm check body.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 3

This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 3.  Aka:

Form 3:
  #define DEF_SAT_U_TRUC_FMT_3(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x <= max ? (NT)x : (NT) max;    \
  }

DEF_SAT_U_TRUC_FMT_3 (uint32_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-13.c: New test.
* gcc.target/riscv/sat_u_trunc-14.c: New test.
* gcc.target/riscv/sat_u_trunc-15.c: New test.
* gcc.target/riscv/sat_u_trunc-run-13.c: New test.
* gcc.target/riscv/sat_u_trunc-run-14.c: New test.
* gcc.target/riscv/sat_u_trunc-run-15.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 2

This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 2.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x > max ? (NT) max : (NT)x;     \
  }

DEF_SAT_U_TRUC_FMT_2 (uint32_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-7.c: New test.
* gcc.target/riscv/sat_u_trunc-8.c: New test.
* gcc.target/riscv/sat_u_trunc-9.c: New test.
* gcc.target/riscv/sat_u_trunc-run-7.c: New test.
* gcc.target/riscv/sat_u_trunc-run-8.c: New test.
* gcc.target/riscv/sat_u_trunc-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[committed] Avoid right shifting signed value on ext-dce.cc

This is analogous to a prior patch to ext-dce which fixes propagation of sign
bits, but this time for the saturating variants.  I'd held off fixing those
because I wanted the time to look at that code (since we don't have a testcase
for it as far as I know).

Not surprisingly, putting an abort on that path and running an x86 bootstrap
and testsuite run, it never triggers.  Of course not a lot of code tries to do
saturating shifts.

Anyway, bootstrapped and regression tested on x86_64.  Pushing to the trunk.

Thanks for everyone's patience.

gcc/
* ext-dce.cc (carry_backpropagate): Cast mask to HOST_WIDE_INT before
shifting.

t-rtems: add rv32imf architecture to the RTEMS multilib for RISC-V

The attach patch is specific to the RTEMS RISC-V architecture multilib which is
controlled by the t-rtems file in the gcc/config/riscv/ directory. The patch
file was created from the gcc-13.3.0 branch. It was successfully tested within
RTEMS Source Builder.

gcc/
* config/riscv/t-rtems: Add ilp32f multilib.

Adjust v850 rotate expander to allow more cases for V850E3V5

The recent if-conversion changes tripped a failure on the v850 port.

The core underlying issue is that while the if-conversion code tries to do the
right thing with noce_can_force_operand to determine if it can force an
arbitrary operand into a register, it's not really a sufficient check.

Essentially for arithmetic codes, it checks the operands.  If the operands are
force-able and there's a code_to_optab mapping, then it returns true.

code_to_optab doesn't actually check anything other than the existence of  a
mapping in the target.  If the target pattern has restrictions enforced by the
condition or it's an expander that is allowed to FAIL, then
noce_can_force_operand to be true, even though we may not be able to directly
force the operand into a register.

This came up on the v850 when we had an operand that was a rotate by a constant
number of bits (I don't remember the count, all that's important about it was
the count was not 8 or 16).

The v850 port has this define_expand:

> (define_expand "rotlsi3"
>   [(parallel [(set (match_operand:SI 0 "register_operand" "")
>                    (rotate:SI (match_operand:SI 1 "register_operand" "")
>                               (match_operand:SI 2 "const_int_operand" "")))
>               (clobber (reg:CC CC_REGNUM))])]
>   "(TARGET_V850E_UP)"
>   {
>     if (INTVAL (operands[2]) != 16)
>       FAIL;
>   })
So the only rotate count allowed is 16 (there's a similar HI rotate with a count of 8).  AFAICT the rotate patterns are allowed to FAIL.  So naturally the expander fails and we get a testsuite regression:

> Tests that now fail, but worked before (4 tests):
>
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)

This patch works around the problem by allowing the rotates in additional
cases, particularly for the V850E3V5+ variants which have a general rotate
capability.  But let's be clear, this is just a workaround and I expect we're
going to have to revisit the code to test if an operand can be forced into a
register.

gcc/
* config/v850/v850.md (rotlsi3): Allow more cases for V850E3V5+.

RISC-V: Fix ICE for vector single-width integer multiply-add intrinsics

When rs1 is the immediate 0, the following ICE occurs:

error: unrecognizable insn:
(insn 8 5 12 2 (set (reg:RVVM1DI 134 [ <retval> ])
        (if_then_else:RVVM1DI (unspec:RVVMF64BI [
                    (const_vector:RVVMF64BI repeat [
                            (const_int 1 [0x1])
                       ])
                    (reg/v:DI 137 [ vl ])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 [0]))
                    (reg/v:RVVM1DI 136 [ vs2 ]))
                (reg/v:RVVM1DI 135 [ vd ]))
            (reg/v:RVVM1DI 135 [ vd ])))

gcc/ChangeLog:

* config/riscv/vector.md: Allow scalar operand to be 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-7.c: New test.
* gcc.target/riscv/rvv/base/bug-8.c: New test.

[RISC-V][PR target/116282] Stabilize pattern conditions

So as expected the core problem with target/116282 is that the cost of certain
constant synthesis cases varied depending on whether or not we're allowed to
generate new pseudos or not.

That in turn meant that in obscure cases an insn might change from recognizable
to unrecognizable and triggers the observed failure.

So we need to keep the cost stable, at least when called from a pattern's
condition.  So we pass another boolean down when necessary. I've tried to keep
API fallout minimized.

Built and tested  on rv32 in my tester.  Let's see what pre-commit testing has
to say though 🙂

Note this will also require a minor change to the in-flight constant synthesis
work.

PR target/116282
gcc/
* config/riscv/riscv-protos.h (riscv_const_insns): Add new argument.
* config/riscv/riscv.cc (riscv_build_integer): Add new argument
ALLOW_NEW_PSEUDOS.  Pass it down to recursive calls and check it
before using synthesis which allows new registers to be created.
(riscv_split_integer_cost): Pass new argument to riscv_build_integer.
(riscv_integer_cost): Add ALLOW_NEW_PSEUDOS argument, pass it down to
riscv_build_integer.
(riscv_legitimate_constant_p): Pass new argument to riscv_const_insns.
(riscv_const_insns): New argment ALLOW_NEW_PSEUDOS.  Pass it down to
riscv_integer_cost and riscv_const_insns.
(riscv_split_const_insns): Pass new argument to riscv_const_insns.
(riscv_move_integer, riscv_rtx_costs): Similarly.
* config/riscv/riscv.md (shadd with costly constant): Pass new argument
to riscv_const_insns.
* config/riscv/bitmanip.md (and with costly constant): Pass new argument
to riscv_const_insns.

gcc/testsuite/
* gcc.target/riscv/pr116282.c: New test.

RISC-V: Bugfix for RVV rounding intrinsic ICE in function checker

When compiling an interface for rounding of type 'vfloat16*' without using zvfh
or zvfhmin, it is not enough to use FLOAT_MODE_P because the type does not
support it. Although the subsequent riscv_validate_vector_type checks will
still fail and throw exceptions, I don't think we should have ICE here.

internal compiler error: in check, at config/riscv/riscv-vector-builtins-shapes.cc:444
   10 |   return __riscv_vfadd_vv_f16m1_rm (vs2, vs1, 0, vl);
      |   ^~~~~~
0x4191794 internal_error(char const*, ...)
        /iothome/jin.ma/code/master/gcc/gcc/diagnostic-global-context.cc:491
0x416ebf5 fancy_abort(char const*, int, char const*)
        /iothome/jin.ma/code/master/gcc/gcc/diagnostic.cc:1772
0x220aae6 riscv_vector::build_frm_base::check(riscv_vector::function_checker&) const
        /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:444
0x2205323 riscv_vector::function_checker::check()
        /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4414

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_vector_float_type_p): New.
* config/riscv/riscv-vector-builtins.cc (function_instance::any_type_float_p):
Use riscv_vector_float_type_p instead of FLOAT_MODE_P for judgment.
* config/riscv/riscv.cc (riscv_vector_int_type_p): Change static to extern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-9.c: New test.

RISC-V: Bugfix incorrect operand for vwsll auto-vect

This patch would like to fix one ICE when rv64gcv_zvbb for vwsll.
Consider below example.

void vwsll_vv_test (short *restrict dst, char *restrict a,
                    int *restrict b, int n)
{
  for (int i = 0; i < n; i++)
    dst[i] = a[i] << b[i];
}

It will hit the vwsll pattern with following operands.
operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ])
operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ])
operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ])

According to the ISA, operand 2 should be the same as operand 1.
Aka operand 2 should have RVVMF4QI mode as above.  Thus,  add
quad truncation for operand 2 before emit vwsll.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/116280

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add quad truncation to
align the mode requirement for vwsll.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr116280-1.c: New test.
* gcc.target/riscv/rvv/base/pr116280-2.c: New test.

RISC-V: Add auto-vect pattern for vector rotate shift

This patch add the vector rotate shift pattern for auto-vect.
With this patch, the scalar rotate shift can be automatically
vectorized into vector rotate shift.

gcc/ChangeLog:

* config/riscv/autovec.md (v<bitmanip_optab><mode>3):
Add new define_expand pattern for vector rotate shift.
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.

libstdc++: Update references to gcc.gnu.org/onlinedocs

libstdc++-v3:
* doc/xml/manual/abi.xml: Update reference to
gcc.gnu.org/onlinedocs.
* doc/xml/manual/concurrency_extensions.xml (interface): Ditto.
* doc/xml/manual/extensions.xml: Ditto.
* doc/xml/manual/parallel_mode.xml: Ditto.
* doc/xml/manual/shared_ptr.xml: Ditto.
* doc/xml/manual/using_exceptions.xml: Ditto. And change GNU GCC
to GCC.
* doc/html/manual/abi.html: Regenerate.
* doc/html/manual/ext_concurrency_impl.html: Ditto.
* doc/html/manual/ext_demangling.html: Ditto.
* doc/html/manual/memory.html: Ditto.
* doc/html/manual/parallel_mode_design.html: Ditto.
* doc/html/manual/parallel_mode_using.html: Ditto.
* doc/html/manual/using_exceptions.html: Ditto.

doc: Tweak PIM4 link

gcc:
* doc/gm2.texi (What is GNU Modula-2): Tweak PIM4 link.