git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: Fix sve/pcs/args_1.c failures [PR116604]

This test has been failing since r15-1619-g3b9b8d6cfdf593, which made
IRA prefer a call-clobbered register over a call-preserved register
for mem1 (the second load).  In this particular case, that just
forces the variable p3 to be allocated to a call-preserved register
instead, leading to an extra predicate move from p3 to that register.

However, it was really pot luck that this worked before.  Each argument
is used exactly once, so there isn't an obvious colouring order.
And mem0 and mem1 are passed by indirect reference, so they are not
REG_EQUIV to a stack slot in the way that some memory arguments are.

IIRC, the test was the result of some experimentation, and so I think
the best fix is to rework it to try to make it less sensitive to RA
decisions.  This patch does that by enabling scheduling for the
function and using both memory arguments in the same instruction.
This gets rid of the distracting prologue and epilogue code and
restricts the test to the PCS parts.

gcc/testsuite/
PR testsuite/116604
* gcc.target/aarch64/sve/pcs/args_1.c (callee_pred): Enable scheduling
and use both memory arguments in the same instruction.  Expect no
prologue and epilogue code.

testsuite: Fix sve/var_stride_*.c failures

gcc.target/aarch64/sve/var_stride_2.c started failing after
r15-268-g9dbff9c05520, but the change was an improvement:

@@ -36,13 +36,11 @@
b.any .L9
ret
.L17:
- ubfiz x5, x3, 10, 16
- ubfiz x4, x2, 10, 16
- add x5, x1, x5
- add x4, x0, x4
- cmp x0, x5
- ccmp x1, x4, 2, ls
uxtw x4, w2
+ add x6, x1, x3, lsl 10
+ cmp x0, x6
+ add x5, x0, x4, lsl 10
+ ccmp x1, x5, 2, ls
ccmp w2, 0, 4, hi
beq .L3
cntb x7

This patch therefore changes the test to expect the new output
for var_stride_2.c.

The changes for var_stride_4.c were a wash, with both versions
having 18(!) arithmetic instructions before the alias check branch.
Both versions sign-extend the n and m arguments as part of this
sequence; the question is whether they do it first or later.

This patch therefore changes the test to accept either the old
or the new code for var_stride_4.c.

gcc/testsuite/
* gcc.target/aarch64/sve/var_stride_2.c: Expect ADD+LSL.
* gcc.target/aarch64/sve/var_stride_4.c: Accept LSL or SBFIZ.

tree-optimization/118521 - bogus diagnostic from unreachable code

When SCCP does final value replacement we end up with unfolded IL like

__result_274 = _150 + 1;
...
__new_finish_106 = __result_274 + 3; <-- from SCCP
_115 = _150 + 4;
if (__new_finish_106 != _115)

this does only get rectified by the next full folding which happens
in forwprop4 which is after the strlen pass emitting the unwanted
diagnostic. The following mitigates this case in a similar way as
r15-7472 did for PR118817 - by ensuring we have the IL folded.
This is done by simply folding all immediate uses of the former
PHI def that SCCP replaces. All other more general approaches have
too much fallout at this point.

PR tree-optimization/118521
* tree-scalar-evolution.cc (final_value_replacement_loop):
Fold uses of the replaced PHI def.

* g++.dg/torture/pr118521.C: New testcase.

invoke.texi: Fix typo in the file-cache-lines param

file-cache-lines param was documented as file-cache-files. This fixes
the typo.

gcc/ChangeLog:

* doc/invoke.texi: Fix typo file-cache-files ->
file-cache-lines.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

libstdc++: Workaround Clang bug with __array_rank built-in [PR118559]

We started using the __array_rank built-in with r15-1252-g6f0dfa6f1acdf7
but that built-in is buggy in versions of Clang up to and including 19.

libstdc++-v3/ChangeLog:

PR libstdc++/118559
* include/std/type_traits (rank, rank_v): Do not use
__array_rank for Clang 19 and older.

libstdc++: Add parentheses around _GLIBCXX_HAS_BUILTIN definition

This allows _GLIBCXX_HAS_BUILTIN (or _GLIBCXX_USE_BUILTIN_TRAIT) to be
used as part of a larger logical expression.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAS_BUILTIN): Add parentheses.

libstdc++: Use new type-generic built-ins in <bit> [PR118855]

This makes several functions in <bit> faster to compile, with fewer
expressions to parse and fewer instantiations of __numeric_traits
required.

libstdc++-v3/ChangeLog:

PR libstdc++/118855
* include/std/bit (__count_lzero, __count_rzero, __popcount):
Use type-generic built-ins when available.

libstdc++: Fix invalid signed arguments to <bit> functions

These should have been unsigned, but the static assertions are only in
the public std::bit_ceil and std::bit_width functions, not the internal
__bit_ceil and __bit_width ones.

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h (__find_next_valid_abi): Cast
__bit_ceil argument to unsigned.
* src/c++17/floating_from_chars.cc (__floating_from_chars_hex):
Cast __bit_ceil argument to unsigned.
* src/c++17/memory_resource.cc (big_block): Cast __bit_width
argument to unsigned.

libstdc++: Remove workaround for reserved init_priority warnings

Since r15-7511-g4e7f74225116e7 we can disable the warnings for using a
reserved priority using a diagnostic pragma. That means we no longer
need to put globals using that attribute into separate files that get
included.

This replaces the two uses of such separate files by moving the variable
definition into the source file and adding the diagnostic pragma.

libstdc++-v3/ChangeLog:

* src/c++17/memory_resource.cc (default_res): Define here
instead of including default_resource.h.
* src/c++98/globals_io.cc (__ioinit): Define here instead of
including ios_base_init.h.
* src/c++17/default_resource.h: Removed.
* src/c++98/ios_base_init.h: Removed.

libstdc++: Use init_priority attribute for tzdb globals [PR118811]

When linking statically to libstdc++.a (or to libstdc++_nonshared.a in
the RHEL devtoolset compiler) there's a static initialization order
problem where user code might be constructed before the
std::chrono::tzdb_list globals, and so might try to use them after
they've already been destroyed.

Use the init_priority attribute on those globals so that they are
initialized early. Since r15-7511-g4e7f74225116e7 we can disable the
warnings for using a reserved priority using a diagnostic pragma.

libstdc++-v3/ChangeLog:

PR libstdc++/118811
* src/c++20/tzdb.cc (tzdb_list::_Node): Use init_priority
attribute on static data members.
* testsuite/std/time/tzdb_list/pr118811.cc: New test.

Fortran: Remove deprecated coarray routines [PR107635]

gcc/fortran/ChangeLog:

PR fortran/107635

* gfortran.texi: Remove deprecated functions from documentation.
* trans-decl.cc (gfc_build_builtin_function_decls): Remove
decprecated function decls.
* trans-intrinsic.cc (gfc_conv_intrinsic_exponent): Remove
deprecated/no longer needed routines.
* trans.h: Remove unused decls.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_get): Removed because deprecated.
(_gfortran_caf_send): Same.
(_gfortran_caf_sendget): Same.
(_gfortran_caf_send_by_ref): Same.
* caf/single.c (assign_char4_from_char1): Same.
(assign_char1_from_char4): Same.
(convert_type): Same.
(defined): Same.
(_gfortran_caf_get): Same.
(_gfortran_caf_send): Same.
(_gfortran_caf_sendget): Same.
(copy_data): Same.
(get_for_ref): Same.
(_gfortran_caf_get_by_ref): Same.
(send_by_ref): Same.
(_gfortran_caf_send_by_ref): Same.
(_gfortran_caf_sendget_by_ref): Same.

Fortran: Add transfer_between_remotes [PR107635]

Add the last missing coarray data manipulation routine using remote
accessors.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (rewrite_caf_send): Rewrite to
transfer_between_remotes when both sides of the assignment have
a coarray.
(coindexed_code_callback): Prevent duplicate rewrite.
* gfortran.texi: Add documentation for transfer_between_remotes.
* intrinsic.cc (add_subroutines): Add intrinsic symbol for
caf_sendget to allow easy rewrite to transfer_between_remotes.
* trans-decl.cc (gfc_build_builtin_function_decls): Add
prototype for transfer_between_remotes.
* trans-intrinsic.cc (conv_caf_vector_subscript_elem): Mark as
deprecated.
(conv_caf_vector_subscript): Same.
(compute_component_offset): Same.
(conv_expr_ref_to_caf_ref): Same.
(conv_stat_and_team): Extract stat and team from expr.
(gfc_conv_intrinsic_caf_get): Use conv_stat_and_team.
(conv_caf_send_to_remote): Same.
(has_ref_after_cafref): Mark as deprecated.
(conv_caf_sendget): Translate to transfer_between_remotes.
* trans.h: Add prototype for transfer_between_remotes.

libgfortran/ChangeLog:

* caf/libcaf.h: Add prototype for transfer_between_remotes.
* caf/single.c: Implement transfer_between_remotes.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Fix up scan_trees.

Fortran: Add send_to_remote [PR107635]

Refactor to use send_to_remote instead of the slow send_by_ref.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (move_coarray_ref): Move the coarray reference out
of the given one. Especially when there is a regular array ref.
(fixup_comp_refs): Move components refs to a derived type where
the codim has been removed, aka a new type.
(split_expr_at_caf_ref): Correctly split the reference chain.
(remove_caf_ref): Simplify.
(create_get_callback): Fix some deficiencies.
(create_allocated_callback): Adapt to new signature of split.
(create_send_callback): New function.
(rewrite_caf_send): Rewrite a call to caf_send to
caf_send_to_remote.
(coindexed_code_callback): Treat caf_send and caf_sendget
correctly.
* gfortran.h (enum gfc_isym_id): Add SENDGET-isym.
* gfortran.texi: Add documentation for send_to_remote.
* resolve.cc (gfc_resolve_code): No longer generate send_by_ref
when allocatable coarray (component) is on the lhs.
* trans-decl.cc (gfc_build_builtin_function_decls): Add
caf_send_to_remote decl.
* trans-intrinsic.cc (conv_caf_func_index): Ensure the static
variables created are not in a block-scope.
(conv_caf_send_to_remote): Translate caf_send_to_remote calls.
(conv_caf_send): Renamed to conv_caf_sendget.
(conv_caf_sendget): Renamed from conv_caf_send.
(gfc_conv_intrinsic_subroutine): Branch correctly for
conv_caf_send and sendget.
* trans.h: Correct decl.

libgfortran/ChangeLog:

* caf/libcaf.h: Add/Correct prototypes for caf_get_from_remote,
caf_send_to_remote.
* caf/single.c (struct accessor_hash_t): Rename accessor_t to
getter_t.
(_gfortran_caf_register_accessor): Use new name of getter_t.
(_gfortran_caf_send_to_remote): New function for sending data to
coarray on a remote image.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/send_char_array_1.f90: Extend test to
catch more cases.
* gfortran.dg/coarray_42.f90: Invert tests use, because no
longer a send is needed when local memory in a coarray is
allocated.

Fortran: Add caf_is_present_on_remote. [PR107635]

Replace caf_is_present by caf_is_present_on_remote which is using a
dedicated callback for each object to test on the remote image.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (create_allocated_callback): Add creating remote
side procedure for checking allocation status of coarray.
(rewrite_caf_allocated): Rewrite ALLOCATED on coarray to use caf
routine.
(coindexed_expr_callback): Exempt caf_is_present_on_remote from
being rewritten again.
* gfortran.h (enum gfc_isym_id): Add caf_is_present_on_remote
id.
* gfortran.texi: Add documentation for caf_is_present_on_remote.
* intrinsic.cc (add_functions): Add caf_is_present_on_remote
symbol.
* trans-decl.cc (gfc_build_builtin_function_decls): Define
interface of caf_is_present_on_remote.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_is_present_remote):
Translate caf_is_present_on_remote.
(trans_caf_is_present): Remove.
(caf_this_image_ref): Remove.
(gfc_conv_allocated): Take out coarray treatment, because that
is rewritten to caf_is_present_on_remote now.
(gfc_conv_intrinsic_function): Handle caf_is_present_on_remote
calls.
* trans.h: Add symbol for caf_is_present_on_remote and remove
old one.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_is_present_on_remote): Add new
function.
(_gfortran_caf_is_present): Remove deprecated one.
* caf/single.c (struct accessor_hash_t): Add function ptr access
for remote side call.
(_gfortran_caf_is_present_on_remote): Added.
(_gfortran_caf_is_present): Removed.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/coarray_allocated.f90: Adapt to new method
of checking on remote image.
* gfortran.dg/coarray_lib_alloc_4.f90: Same.

Fortran: Allow to use non-pure/non-elemental functions in coarray indexes [PR107635]

Extract calls to non-pure or non-elemental functions from index
expressions on a coarray.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (get_arrayspec_from_expr): Treat array result of
function calls correctly.
(remove_coarray_from_derived_type): Prevent memory loss.
(add_caf_get_from_remote): Correct locus.
(find_comp): New function to find or create a new component in a
derived type.
(check_add_new_comp_handle_array): Handle allocatable arrays or
non-pure/non-elemental functions in indexes of coarrays.
(check_add_new_component): Use above function.
(create_get_parameter_type): Rename to
create_caf_add_data_parameter_type.
(create_caf_add_data_parameter_type): Renaming of variable and
make the additional data a coarray.
(remove_caf_ref): Factor out to reuse in other caf-functions.
(create_get_callback): Use function factored out, set locus
correctly and ensure a kind is set for parameters.
(add_caf_get_intrinsic): Rename to add_caf_get_from_remote and
rename some variables.
(coindexed_expr_callback): Skip over function created by the
rewriter.
(coindexed_code_callback): Filter some intrinsics not to
process.
(gfc_coarray_rewrite): Rewrite also contained functions.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Reflect
changed order on caf_get_from_remote ().

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_register_accessor): Reflect
changed parameter order.
* caf/single.c (struct accessor_hash_t): Same.
(_gfortran_caf_register_accessor): Call accessor using a token
for accessing arrays with a descriptor on the source side.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Adapt scan expression.
* gfortran.dg/coarray/get_with_fn_parameter.f90: New test.
* gfortran.dg/coarray/get_with_scalar_fn.f90: New test.

Fortran: Prepare for more caf-rework. [PR107635]

Factor out generation of code to get remote function index and to
create the additional data structure. Rename caf_get_by_ct to
caf_get_from_remote.

gcc/fortran/ChangeLog:

PR fortran/107635

* gfortran.texi: Rename caf_get_by_ct to caf_get_from_remote.
* trans-decl.cc (gfc_build_builtin_function_decls): Rename
intrinsic.
* trans-intrinsic.cc (conv_caf_func_index): Factor out
functionality to be reused by other caf-functions.
(conv_caf_add_call_data): Same.
(gfc_conv_intrinsic_caf_get): Use functions factored out.
* trans.h: Rename intrinsic symbol.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_get_by_ref): Remove from ABI.
This function is replaced by caf_get_from_remote ().
(_gfortran_caf_get_remote_function_index): Use better name.
* caf/single.c (_gfortran_caf_finalize): Free internal data.
(_gfortran_caf_get_by_ref): Remove from public interface, but
keep it, because it is still used by sendget ().

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Adapt to renamed ABI
function.
* gfortran.dg/coarray_stat_function.f90: Same.
* gfortran.dg/coindexed_1.f90: Same.

Fortran: Move caf_get-rewrite to coarray.cc [PR107635]

Add a rewriter to keep all expression tree that is not optimization
together. At the moment this is just a move from resolve.cc, but will
be extended to handle more cases where rewriting the expression tree may
be easier. The first use case is to extract accessors for coarray
remote image data access.

gcc/fortran/ChangeLog:

PR fortran/107635
* Make-lang.in: Add coarray.cc.
* coarray.cc: New file.
* gfortran.h (gfc_coarray_rewrite): New procedure.
* parse.cc (rewrite_expr_tree): Add entrypoint for rewriting
expression trees.
* resolve.cc (gfc_resolve_ref): Remove caf_lhs handling.
(get_arrayspec_from_expr): Moved to rewrite.cc.
(remove_coarray_from_derived_type): Same.
(convert_coarray_class_to_derived_type): Same.
(split_expr_at_caf_ref): Same.
(check_add_new_component): Same.
(create_get_parameter_type): Same.
(create_get_callback): Same.
(add_caf_get_intrinsic): Same.
(resolve_variable): Remove caf_lhs handling.

libgfortran/ChangeLog:

* caf/single.c (_gfortran_caf_finalize): Free memory preventing
leaks.
(_gfortran_caf_get_by_ct): Fix constness.
* caf/libcaf.h (_gfortran_caf_register_accessor): Fix constness.

tree-optimization/86270 - improve SSA coalescing for loop exit test

The PR indicates a very specific issue with regard to SSA coalescing
failures because there's a pre IV increment loop exit test.  While
IVOPTs created the desired IL we later simplify the exit test into
the undesirable form again.  The following fixes this up during RTL
expansion where we try to improve coalescing of IVs.  That seems
easier that trying to avoid the simplification with some weird
heuristics (it could also have been written this way).

PR tree-optimization/86270
* tree-outof-ssa.cc (insert_backedge_copies): Pattern
match a single conflict in a loop condition and adjust
that avoiding the conflict if possible.

* gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
copies as well.

x86: Add a test for PR target/118936

Add a test for PR target/118936 which was fixed by reverting:

565d4e75549 i386: Simplify PARALLEL RTX scan in ix86_find_all_reg_use
11902be7a57 x86: Properly find the maximum stack slot alignment

PR target/118936
* gcc.target/i386/pr118936.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Revert "x86: Properly find the maximum stack slot alignment"

This reverts commit 11902be7a57c0ccf03786aa0255fffaf0f54dbf9.

Revert "i386: Simplify PARALLEL RTX scan in ix86_find_all_reg_use"

This reverts commit 565d4e755498ad2b5ed55e368ef61eb9511cda3a.

libstdc++: Rename concat_view::iterator to ::_Iterator

Even though 'iterator' is a reserved macro name, we can't use it as the
name of this implementation detail type since it could introduce name
lookup ambiguity in valid code, e.g.

struct A { using iterator = void; }
struct B : concat_view<...>, A { using type = iterator; };

libstdc++-v3/ChangeLog:

* include/std/ranges (concat_view::iterator): Rename to ...
(concat_view::_Iterator): ... this throughout.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Sync concat_view with final P2542 revision [PR115209]

Our concat_view implementation is accidentally based off of an older
revision of the paper, P2542R7 instead of R8. As far as I can tell the
only semantic change in the final revision is the relaxed constraints on
the iterator's iter/sent operator- overloads, which this patch updates.

This patch also simplifies the concat_view::end wording via C++26 pack
indexing as per the final revision. In turn we make the availability of
this library feature conditional on __cpp_pack_indexing. (Note pack
indexing is implemented in GCC 15 and Clang 19).

PR libstdc++/115209

libstdc++-v3/ChangeLog:

* include/bits/version.def (ranges_concat): Depend on
__cpp_pack_indexing.
* include/bits/version.h: Regenerate.
* include/std/ranges (__detail::__last_is_common): Remove.
(__detail::__all_but_first_sized): New.
(concat_view::end): Use C++26 pack indexing instead of
__last_is_common as per R8 of P2542.
(concat_view::iterator::operator-): Update constraints on
iter/sent overloads as per R8 of P2542.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

Daily bump.

GCN, nvptx: Support '--enable-languages=all'

..., where "support" means that the build doesn't fail, but it doesn't mean
that all target libraries get built and we get pretty test results for the
additional languages.

* configure.ac (unsupported_languages) [GCN, nvptx]: Add 'ada'.
(noconfigdirs) [GCN, nvptx]: Add 'target-libobjc',
'target-libffi', 'target-libgo'.
* configure: Regenerate.

AVR: Add new ISR test gcc.target/avr/torture/isr-04-regs.c.

gcc/testsuite/
* gcc.target/avr/torture/isr-04-regs.c: New test.
* gcc.target/avr/isr-test.h: Don't set GPRs to values
that are 0 mod 0x11.

aarch64: Fix testcase pr112105.c

This testcase started to fail with r15-268-g9dbff9c05520a7.
When late_combine was added, it was turned on for -O2+ only,
so this testcase still failed.
This changes the option to be -O2 instead of -O and the testcase
started to pass again.

tested for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr112105.c: Change to be -O2 rather
than -O1.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

input: give file_cache_slot its own copy of the file path [PR118919]

input.cc's file_cache was borrowing copies of the file name.
This could lead to use-after-free when writing out sarif output
from Fortran, which frees its filenames before the sarif output
is fully written out.

Fix by taking a copy in file_cache_slot.

gcc/ChangeLog:
PR other/118919
* input.cc (file_cache_slot::m_file_path): Make non-const.
(file_cache_slot::evict): Free m_file_path.
(file_cache_slot::create): Store a copy of file_path if non-null.
(file_cache_slot::~file_cache_slot): Free m_file_path.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: handle more IFN_UBSAN_* as no-ops [PR118300]

Previously the analyzer treated IFN_UBSAN_BOUNDS as a no-op, but
the other IFN_UBSAN_* were unrecognized and conservatively treated
as having arbitrary behavior.

Treat IFN_UBSAN_NULL and IFN_UBSAN_PTR also as no-ops, which should
make -fanalyzer behave better with -fsanitize=undefined.

gcc/analyzer/ChangeLog:
PR analyzer/118300
* kf.cc (class kf_ubsan_bounds): Replace this with...
(class kf_ubsan_noop): ...this.
(register_sanitizer_builtins): Use it to handle IFN_UBSAN_NULL,
IFN_UBSAN_BOUNDS, and IFN_UBSAN_PTR as nop-ops.
(register_known_functions): Drop handling of IFN_UBSAN_BOUNDS
here, as it's now handled by register_sanitizer_builtins above.

gcc/testsuite/ChangeLog:
PR analyzer/118300
* gcc.dg/analyzer/ubsan-pr118300.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Vect: Fix ICE when vect_verify_loop_lens acts on relevant mode [PR116351]

This patch would like to fix the ICE similar as below, assump we have
sample code:

 1 │ int a, b, c;
 2 │ short d, e, f;
 3 │ long g (long h) { return h; }
 4 │
 5 │ void i () {
 6 │ for (; b; ++b) {
 7 │ f = 5 >> a ? d : d << a;
 8 │ e &= c | g(f);
 9 │ }
 10 │ }

It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl

during GIMPLE pass: vect
pr116351-1.c: In function ‘i’:
pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode,
at optabs-tree.cc:655
 8 | void i () {
 | ^
0x44d6b9d internal_error(char const*, ...)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
0x44a26a6 fancy_abort(char const*, int, char const*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, vec<int, va_heap, vl_ptr>*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655
0x1fada40 vect_verify_loop_lens
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566
0x1fb2b07 vect_analyze_loop_2
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037
0x1fb4302 vect_analyze_loop_1
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478
0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638
0x203c2dc try_vectorize_loop_1
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095
0x203c839 try_vectorize_loop
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212
0x203cb2c execute

During vectorization the override_widen pattern matched and then will get DImode
as vector_mode in loop_info. After that the loop_vinfo will step in vect_analyze_xx
with below flow:

vect_analyze_loop_2
|- vect_pattern_recog // over-widening and set loop_vinfo->vector_mode to DImode
|- ...
|- vect_analyze_loop_operations
 |- stmt_info->def_type == vect_reduction_def
 |- stmt_info->slp_type == pure_slp
 |- vectorizable_lc_phi // Not Hit
 |- vectorizable_induction // Not Hit
 |- vectorizable_reduction // Not Hit
 |- vectorizable_recurr // Not Hit
 |- vectorizable_live_operation // Not Hit
 |- vect_analyze_stmt
 |- stmt_info->relevant == vect_unused_in_scope
 |- stmt_info->live == false
 |- p pattern_stmt_info == (stmt_vec_info) 0x0
 |- return opt_result::success ();
 OR
 |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP analysis\n"
 |- Early return opt_result::success ();
 |- vectorizable_load/store/call_convert/... // Not Hit
 |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS(loop_vinfo).is_empty ()
 |- vect_verify_loop_lens (loop_vinfo)
 |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result in ICE

Finally, the DImode in loop_vinfo will hit the assert (VECTOR_MODE_P (mode))
in vect_verify_loop_lens. This patch would like to return false
directly if the loop_vinfo has relevant mode like DImode for the ICE
fix, but still may have mis-optimization for similar cases. We will try
to cover that in separated patches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

PR middle-end/116351

gcc/ChangeLog:

* tree-vect-loop.cc (vect_verify_loop_lens): Return false if the
loop_vinfo has relevant mode such as DImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr116351-1.c: New test.
* gcc.target/riscv/rvv/base/pr116351-2.c: New test.
* gcc.target/riscv/rvv/base/pr116351.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

LoongArch: Use normal RTL pattern instead of UNSPEC for {x,}vsr{a,l}ri instructions

Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
shift operation.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
(UNSPEC_LASX_XVSRLRI): Remove.
(lasx_xvsrari_<lsxfmt>): Remove.
(lasx_xvsrlri_<lsxfmt>): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VSRARI): Remove.
(UNSPEC_LSX_VSRLRI): Remove.
(lsx_vsrari_<lsxfmt>): Remove.
(lsx_vsrlri_<lsxfmt>): Remove.
* config/loongarch/simd.md (simd_<optab>_imm_round_<mode>): New
define_insn.
(<simd_isa>_<x>v<insn>ri_<simdfmt>): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-shift-imm-round.c: New test.

LoongArch: Implement [su]dot_prod* for LSX and LASX modes

Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:

- gcc.dg/vect/vect-reduc-chain-2.c
- gcc.dg/vect/vect-reduc-chain-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-4.c

gcc/ChangeLog:

* config/loongarch/simd.md (wvec_half): New define_mode_attr.
(<su>dot_prod<wvec_half><mode>): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan
DOT_PROD_EXPR in optimized tree.

LoongArch: Implement vec_widen_mult_{even,odd}_* for LSX and LASX modes

Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.

gcc/ChangeLog:

* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_<su>mult_<even_odd>_<mode>): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-1.c: New test.
* gcc.target/loongarch/wide-mul-reduc-2.c: New test.

LoongArch: Simplify lsx_vpick description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.

This is not suitable for LASX where lasx_xvpick has a different
semantic.

gcc/ChangeLog:

* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_i): Make it same as simdfmt for integer vector
modes.
(_f): New define_mode_attr.
* config/loongarch/lsx.md (lsx_vpickev_b): Remove.
(lsx_vpickev_h): Remove.
(lsx_vpickev_w): Remove.
(lsx_vpickev_w_f): Remove.
(lsx_vpickod_b): Remove.
(lsx_vpickod_h): Remove.
(lsx_vpickod_w): Remove.
(lsx_vpickev_w_f): Remove.
(lsx_pick_evod_<mode>): New define_insn.
(lsx_<x>vpick<ev_od>_<simdfmt_as_i><_f>): New
define_expand.

LoongArch: Simplify {lsx_,lasx_x}vmaddw description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove.
(UNSPEC_LASX_XVMADDWEV2): Remove.
(UNSPEC_LASX_XVMADDWEV3): Remove.
(UNSPEC_LASX_XVMADDWOD): Remove.
(UNSPEC_LASX_XVMADDWOD2): Remove.
(UNSPEC_LASX_XVMADDWOD3): Remove.
(lasx_xvmaddwev_h_b): Remove.
(lasx_xvmaddwev_w_h): Remove.
(lasx_xvmaddwev_d_w): Remove.
(lasx_xvmaddwev_q_d): Remove.
(lasx_xvmaddwod_h_b): Remove.
(lasx_xvmaddwod_w_h): Remove.
(lasx_xvmaddwod_d_w): Remove.
(lasx_xvmaddwod_q_d): Remove.
(lasx_xvmaddwev_q_du): Remove.
(lasx_xvmaddwod_q_du): Remove.
(lasx_xvmaddwev_h_bu_b): Remove.
(lasx_xvmaddwev_w_hu_h): Remove.
(lasx_xvmaddwev_d_wu_w): Remove.
(lasx_xvmaddwev_q_du_d): Remove.
(lasx_xvmaddwod_h_bu_b): Remove.
(lasx_xvmaddwod_w_hu_h): Remove.
(lasx_xvmaddwod_d_wu_w): Remove.
(lasx_xvmaddwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove.
(UNSPEC_LSX_VMADDWEV2): Remove.
(UNSPEC_LSX_VMADDWEV3): Remove.
(UNSPEC_LSX_VMADDWOD): Remove.
(UNSPEC_LSX_VMADDWOD2): Remove.
(UNSPEC_LSX_VMADDWOD3): Remove.
(lsx_vmaddwev_h_b): Remove.
(lsx_vmaddwev_w_h): Remove.
(lsx_vmaddwev_d_w): Remove.
(lsx_vmaddwev_q_d): Remove.
(lsx_vmaddwod_h_b): Remove.
(lsx_vmaddwod_w_h): Remove.
(lsx_vmaddwod_d_w): Remove.
(lsx_vmaddwod_q_d): Remove.
(lsx_vmaddwev_q_du): Remove.
(lsx_vmaddwod_q_du): Remove.
(lsx_vmaddwev_h_bu_b): Remove.
(lsx_vmaddwev_w_hu_h): Remove.
(lsx_vmaddwev_d_wu_w): Remove.
(lsx_vmaddwev_q_du_d): Remove.
(lsx_vmaddwod_h_bu_b): Remove.
(lsx_vmaddwod_w_hu_h): Remove.
(lsx_vmaddwod_d_wu_w): Remove.
(lsx_vmaddwod_q_du_d): Remove.
* config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>):
New define_insn.
(<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt>): New
define_expand.
(simd_maddw_evod_<mode>_hetero): New define_insn.
(<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>):
New define_expand.
(<simd_isa>_maddw<ev_od>_q_d_punned): New define_expand.
(<simd_isa>_maddw<ev_od>_q_du_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it
with the punned expand.
(CODE_FOR_lsx_vmaddwev_q_du): Likewise.
(CODE_FOR_lsx_vmaddwev_q_du_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise.

LoongArch: Simplify {lsx_,lasx_x}vh{add,sub}w description

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX_XVHADDW_QU_DU): Remove.
(UNSPEC_LASX_XVHSUBW_QU_DU): Remove.
(lasx_xvh<addsub:optab>w_h_b): Remove.
(lasx_xvh<addsub:optab>w_w_h): Remove.
(lasx_xvh<addsub:optab>w_d_w): Remove.
(lasx_xvhaddw_q_d): Remove.
(lasx_xvhsubw_q_d): Remove.
(lasx_xvhaddw_qu_du): Remove.
(lasx_xvhsubw_qu_du): Remove.
(reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead
of gen_lasx_xvhaddw_q_d.
(reduc_plus_scal_v8si): Likewise.
* config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove.
(UNSPEC_ASX_VHSUBW_Q_D): Remove.
(UNSPEC_ASX_VHADDW_QU_DU): Remove.
(UNSPEC_ASX_VHSUBW_QU_DU): Remove.
(lsx_vh<addsub:optab>w_h_b): Remove.
(lsx_vh<addsub:optab>w_w_h): Remove.
(lsx_vh<addsub:optab>w_d_w): Remove.
(lsx_vhaddw_q_d): Remove.
(lsx_vhsubw_q_d): Remove.
(lsx_vhaddw_qu_du): Remove.
(lsx_vhsubw_qu_du): Remove.
(reduc_plus_scal_v2di): Change the temporary register mode to
V1TI, and pun the mode calling gen_vec_extractv2didi.
(reduc_plus_scal_v4si): Change the temporary register mode to
V1TI.
* config/loongarch/simd.md (simd_h<optab>w_<mode>_<su>): New
define_insn.
(<simd_isa>_<x>vh<optab>w_<simdfmt_w>_<simdfmt>): New
define_expand.
(<simd_isa>_h<optab>w_q_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with
punned expand.
(CODE_FOR_lsx_vhaddw_qu_du): Likewise.
(CODE_FOR_lsx_vhsubw_q_d): Likewise.
(CODE_FOR_lsx_vhsubw_qu_du): Likewise.
(CODE_FOR_lasx_xvhaddw_q_d): Likewise.
(CODE_FOR_lasx_xvhaddw_qu_du): Likewise.
(CODE_FOR_lasx_xvhsubw_q_d): Likewise.
(CODE_FOR_lasx_xvhsubw_qu_du): Likewise.

LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description

These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even indices for define_insn's, and generate those
vectors in define_expand's.

For "backward compatibilty" we need to provide a "punned" version for
the operations invoking TImode vectors as the intrinsics still expect
DImode vectors.

The stat is "201 insertions, 905 deletions."

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove.
(UNSPEC_LASX_XVADDWEV2): Remove.
(UNSPEC_LASX_XVADDWEV3): Remove.
(UNSPEC_LASX_XVSUBWEV): Remove.
(UNSPEC_LASX_XVSUBWEV2): Remove.
(UNSPEC_LASX_XVMULWEV): Remove.
(UNSPEC_LASX_XVMULWEV2): Remove.
(UNSPEC_LASX_XVMULWEV3): Remove.
(UNSPEC_LASX_XVADDWOD): Remove.
(UNSPEC_LASX_XVADDWOD2): Remove.
(UNSPEC_LASX_XVADDWOD3): Remove.
(UNSPEC_LASX_XVSUBWOD): Remove.
(UNSPEC_LASX_XVSUBWOD2): Remove.
(UNSPEC_LASX_XVMULWOD): Remove.
(UNSPEC_LASX_XVMULWOD2): Remove.
(UNSPEC_LASX_XVMULWOD3): Remove.
(lasx_xv<addsubmul:optab>wev_h_b): Remove.
(lasx_xv<addsubmul:optab>wev_w_h): Remove.
(lasx_xv<addsubmul:optab>wev_d_w): Remove.
(lasx_xvaddwev_q_d): Remove.
(lasx_xvsubwev_q_d): Remove.
(lasx_xvmulwev_q_d): Remove.
(lasx_xv<addsubmul:optab>wod_h_b): Remove.
(lasx_xv<addsubmul:optab>wod_w_h): Remove.
(lasx_xv<addsubmul:optab>wod_d_w): Remove.
(lasx_xvaddwod_q_d): Remove.
(lasx_xvsubwod_q_d): Remove.
(lasx_xvmulwod_q_d): Remove.
(lasx_xvaddwev_q_du): Remove.
(lasx_xvsubwev_q_du): Remove.
(lasx_xvmulwev_q_du): Remove.
(lasx_xvaddwod_q_du): Remove.
(lasx_xvsubwod_q_du): Remove.
(lasx_xvmulwod_q_du): Remove.
(lasx_xv<addmul:optab>wev_h_bu_b): Remove.
(lasx_xv<addmul:optab>wev_w_hu_h): Remove.
(lasx_xv<addmul:optab>wev_d_wu_w): Remove.
(lasx_xv<addmul:optab>wod_h_bu_b): Remove.
(lasx_xv<addmul:optab>wod_w_hu_h): Remove.
(lasx_xv<addmul:optab>wod_d_wu_w): Remove.
(lasx_xvaddwev_q_du_d): Remove.
(lasx_xvsubwev_q_du_d): Remove.
(lasx_xvmulwev_q_du_d): Remove.
(lasx_xvaddwod_q_du_d): Remove.
(lasx_xvsubwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove.
(UNSPEC_LSX_VADDWEV2): Remove.
(UNSPEC_LSX_VADDWEV3): Remove.
(UNSPEC_LSX_VSUBWEV): Remove.
(UNSPEC_LSX_VSUBWEV2): Remove.
(UNSPEC_LSX_VMULWEV): Remove.
(UNSPEC_LSX_VMULWEV2): Remove.
(UNSPEC_LSX_VMULWEV3): Remove.
(UNSPEC_LSX_VADDWOD): Remove.
(UNSPEC_LSX_VADDWOD2): Remove.
(UNSPEC_LSX_VADDWOD3): Remove.
(UNSPEC_LSX_VSUBWOD): Remove.
(UNSPEC_LSX_VSUBWOD2): Remove.
(UNSPEC_LSX_VMULWOD): Remove.
(UNSPEC_LSX_VMULWOD2): Remove.
(UNSPEC_LSX_VMULWOD3): Remove.
(lsx_v<addsubmul:optab>wev_h_b): Remove.
(lsx_v<addsubmul:optab>wev_w_h): Remove.
(lsx_v<addsubmul:optab>wev_d_w): Remove.
(lsx_vaddwev_q_d): Remove.
(lsx_vsubwev_q_d): Remove.
(lsx_vmulwev_q_d): Remove.
(lsx_v<addsubmul:optab>wod_h_b): Remove.
(lsx_v<addsubmul:optab>wod_w_h): Remove.
(lsx_v<addsubmul:optab>wod_d_w): Remove.
(lsx_vaddwod_q_d): Remove.
(lsx_vsubwod_q_d): Remove.
(lsx_vmulwod_q_d): Remove.
(lsx_vaddwev_q_du): Remove.
(lsx_vsubwev_q_du): Remove.
(lsx_vmulwev_q_du): Remove.
(lsx_vaddwod_q_du): Remove.
(lsx_vsubwod_q_du): Remove.
(lsx_vmulwod_q_du): Remove.
(lsx_v<addmul:optab>wev_h_bu_b): Remove.
(lsx_v<addmul:optab>wev_w_hu_h): Remove.
(lsx_v<addmul:optab>wev_d_wu_w): Remove.
(lsx_v<addmul:optab>wod_h_bu_b): Remove.
(lsx_v<addmul:optab>wod_w_hu_h): Remove.
(lsx_v<addmul:optab>wod_d_wu_w): Remove.
(lsx_vaddwev_q_du_d): Remove.
(lsx_vsubwev_q_du_d): Remove.
(lsx_vmulwev_q_du_d): Remove.
(lsx_vaddwod_q_du_d): Remove.
(lsx_vsubwod_q_du_d): Remove.
(lsx_vmulwod_q_du_d): Remove.
* config/loongarch/loongarch-modes.def: Add V4TI and V1DI.
* config/loongarch/loongarch-protos.h
(loongarch_gen_stepped_int_parallel): New function prototype.
* config/loongarch/loongarch.cc (loongarch_print_operand):
Accept 'O' for printing "ev" or "od."
(loongarch_gen_stepped_int_parallel): Implement.
* config/loongarch/predicates.md
(vect_par_cnst_even_or_odd_half): New define_predicate.
* config/loongarch/simd.md (WVEC_HALF): New define_mode_attr.
(simdfmt_w): Likewise.
(zero_one): New define_int_iterator.
(ev_od): New define_int_attr.
(simd_<optab>w_evod_<mode:IVEC>_<su>): New define_insn.
(<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt>): New
define_expand.
(simd_<optab>w_evod_<mode>_hetero): New define_insn.
(<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>):
New define_expand.
(DIVEC): New define_mode_iterator.
(<simd_isa>_<optab>w<ev_od>_q_d_punned): New define_expand.
(<simd_isa>_<optab>w<ev_od>_q_du_d_punned): Likewise.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vaddwev_q_d): Define as a macro to override it
with the punned expand.
(CODE_FOR_lsx_vaddwev_q_du): Likewise.
(CODE_FOR_lsx_vsubwev_q_d): Likewise.
(CODE_FOR_lsx_vsubwev_q_du): Likewise.
(CODE_FOR_lsx_vmulwev_q_d): Likewise.
(CODE_FOR_lsx_vmulwev_q_du): Likewise.
(CODE_FOR_lsx_vaddwod_q_d): Likewise.
(CODE_FOR_lsx_vaddwod_q_du): Likewise.
(CODE_FOR_lsx_vsubwod_q_d): Likewise.
(CODE_FOR_lsx_vsubwod_q_du): Likewise.
(CODE_FOR_lsx_vmulwod_q_d): Likewise.
(CODE_FOR_lsx_vmulwod_q_du): Likewise.
(CODE_FOR_lsx_vaddwev_q_du_d): Likewise.
(CODE_FOR_lsx_vmulwev_q_du_d): Likewise.
(CODE_FOR_lsx_vaddwod_q_du_d): Likewise.
(CODE_FOR_lsx_vmulwod_q_du_d): Likewise.
(CODE_FOR_lasx_xvaddwev_q_d): Likewise.
(CODE_FOR_lasx_xvaddwev_q_du): Likewise.
(CODE_FOR_lasx_xvsubwev_q_d): Likewise.
(CODE_FOR_lasx_xvsubwev_q_du): Likewise.
(CODE_FOR_lasx_xvmulwev_q_d): Likewise.
(CODE_FOR_lasx_xvmulwev_q_du): Likewise.
(CODE_FOR_lasx_xvaddwod_q_d): Likewise.
(CODE_FOR_lasx_xvaddwod_q_du): Likewise.
(CODE_FOR_lasx_xvsubwod_q_d): Likewise.
(CODE_FOR_lasx_xvsubwod_q_du): Likewise.
(CODE_FOR_lasx_xvmulwod_q_d): Likewise.
(CODE_FOR_lasx_xvmulwod_q_du): Likewise.
(CODE_FOR_lasx_xvaddwev_q_du_d): Likewise.
(CODE_FOR_lasx_xvmulwev_q_du_d): Likewise.
(CODE_FOR_lasx_xvaddwod_q_du_d): Likewise.
(CODE_FOR_lasx_xvmulwod_q_du_d): Likewise.

LoongArch: Allow moving TImode vectors

We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.

For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX registers so we won't get a reload failure when we start to
save TImode vectors in these registers.

This implicitly depends on the vrepli optimization: without it we'd try
"vrepli.q" which does not really exist and trigger an ICE.

gcc/ChangeLog:

* config/loongarch/lsx.md (mov<LSX:mode>): Remove.
(movmisalign<LSX:mode>): Remove.
(mov<LSX:mode>_lsx): Remove.
* config/loongarch/lasx.md (mov<LASX:mode>): Remove.
(movmisalign<LASX:mode>): Remove.
(mov<LASX:mode>_lasx): Remove.
* config/loongarch/loongarch-modes.def (V1TI): Add.
(V2TI): Mention in the comment.
* config/loongarch/loongarch.md (mode): Add V1TI and V2TI.
* config/loongarch/simd.md (ALLVEC_TI): New mode iterator.
(mov<ALLVEC_TI:mode): New define_expand.
(movmisalign<ALLVEC_TI:mode>): Likewise.
(mov<ALLVEC_TI:mode>_simd): New define_insn_and_split.

LoongArch: Try harder using vrepli instructions to materialize const vectors

For

  a = (v4si){0xdddddddd, 0xdddddddd, 0xdddddddd, 0xdddddddd}

we just want

  vrepli.b $vr0, 0xdd

but the compiler actually produces a load:

  la.local $r14,.LC0
  vld      $vr0,$r14,0

It's because we only tried vrepli.d which wouldn't work.  Try all vrepli
instructions for const int vector materializing to fix it.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_const_vector_vrepli): New function prototype.
* config/loongarch/loongarch.cc (loongarch_const_vector_vrepli):
Implement.
(loongarch_const_insns): Call loongarch_const_vector_vrepli
instead of loongarch_const_vector_same_int_p.
(loongarch_split_vector_move_p): Likewise.
(loongarch_output_move): Use loongarch_const_vector_vrepli to
pun operend[1] into a better mode if it's a const int vector,
and decide the suffix of [x]vrepli with the new mode.
* config/loongarch/constraints.md (YI): Call
loongarch_const_vector_vrepli instead of
loongarch_const_vector_same_int_p.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vrepli.c: New test.

LoongArch: Accept ADD, IOR or XOR when combining objects with no bits in common [PR115478]

Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
It's generally a good thing (allowing to use our alsl instruction or
similar instrunction on other architectures), but it's preventing us
from using bytepick. For example, if we shift a __int128 by 16 bits,
the higher word can be produced via a single bytepick.d instruction with
immediate 2, but we got:

srli.d $r12,$r4,48
slli.d $r5,$r5,16
slli.d $r4,$r4,16
add.d $r5,$r12,$r5
jr $r1

This wasn't work with GCC 14, but after r15-6490 it's supposed to work
if IOR was used instead of PLUS.

To fix this, add a code iterator to match IOR, XOR, and PLUS and use it
instead of just IOR if we know the operands have no overlapping bits.

gcc/ChangeLog:

PR target/115478
* config/loongarch/loongarch.md (any_or_plus): New
define_code_iterator.
(bstrins_<mode>_for_ior_mask): Use any_or_plus instead of ior.
(bytepick_w_<bytepick_imm>): Likewise.
(bytepick_d_<bytepick_imm>): Likewise.
(bytepick_d_<bytepick_imm>_rev): Likewise.

gcc/testsuite/ChangeLog:

PR target/115478
* gcc.target/loongarch/bytepick_shift_128.c: New test.

[PR middle-end/113525] Drop obsolete options from documentation

The sibling and unshare passes were dropped as distinct passes 10+ years ago.
Docs weren't ever updated. This just removes them; given their age I don't
think we need to keep them around any longer.

PR middle-end/113525

gcc/
* doc/invoke.texi (dump-rtl-sibling): Drop documentation for pass
removed long ago.
(dump-rtl-unshare): Likewise.

Daily bump.

Fix description of file-cache-lines/file-cache-files params

The file-cache-lines / file-cache-files tunables were documented in the
wrong section. Fix that.

Reported-by: Filip Kastl
Comitted as obvious.

gcc/ChangeLog:

* doc/invoke.texi:

analyzer: add more properties to sarif output

Add some more properties to the analyzer's sarif output, to
help with debugging -fanalyzer.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(saved_diagnostic::maybe_add_sarif_properties): Add various
properties for debugging, for m_stmt, m_var, and m_duplicates.
Remove stray 'if' statement. Capture the kind of the
pending_diagnostic.
* region-model.cc
(poisoned_value_diagnostic::maybe_add_sarif_properties): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif output: fix alphabetization in sarif_scheme_handler::make_sink

No functional change intended.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/ChangeLog:
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Put
properties in alphabetical order.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

When gcc is built for x86_64-linux-musl target, stack unwinding from
within signal handler stops at the innermost signal frame. The reason
for this behaviro is that the signal trampoline is not accompanied with
appropiate CFI directives, and the fallback path in libgcc to recognize
it by the code sequence is only enabled for glibc except 2.0. The
latter is motivated by the lack of sys/ucontext.h in that glibc version.

Given that all relevant libc-s ship sys/ucontext.h for over a decade,
and that other arches aren't shy of unconditionally using it, follow
suit and remove the preprocessor condition, too.

libgcc/ChangeLog:

* config/i386/linux-unwind.h: Remove preprocessor
condition to enable fallback path for all libc-s.

Signed-off-by: Roman Kagan <rkagan@amazon.de>

RISC-V: Fix ratio in vsetvl fuse rule [PR115703].

In PR115703 we fuse two vsetvls:

    Fuse curr info since prev info compatible with it:
      prev_info: VALID (insn 438, bb 2)
        Demand fields: demand_ge_sew demand_non_zero_avl
        SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(reg:DI 0 zero)
        VL=(reg:DI 9 s1 [312])
      curr_info: VALID (insn 92, bb 20)
        Demand fields: demand_ratio_and_ge_sew demand_avl
        SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(const_int 4 [0x4])
        VL=(nil)
      prev_info after fused: VALID (insn 438, bb 2)
        Demand fields: demand_ratio_and_ge_sew demand_avl
        SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(const_int 4 [0x4])
        VL=(nil).

The result is vsetvl zero, zero, e64, mf2, ta, ma.  The previous vsetvl
set vl = 4 but here we wrongly set it to vl = 2.  As all the following
vsetvls only ever change the ratio we never recover.

The issue is quite difficult to trigger because we can often
deduce the value of d at runtime.  Then very check for the value of
d will be optimized away.

The last known bad commit is r15-3458-g5326306e7d9d36.  With that commit
the output is wrong but -fno-schedule-insns makes it correct.  From the
next commit on the issue is latent.  I still added the PR's test as scan
and run check even if they don't trigger right now.  Not sure if the
run test will ever fail but well.  I verified that the
patch fixes the issue when applied on top of r15-3458-g5326306e7d9d36.

PR target/115703

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the
new LMUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr115703-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr115703.c: New test.

testsuite: Include stdint.h instead of stdint-gcc.h in some tests

When use_gcc_stdint=provide, the stdint-gcc.h header is not provided.

2025-02-18 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

PR testsuite/116986
* gcc.dg/crc-builtin-rev-target32.c: Include stdint.h
instead of stdint-gcc.h.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/crc-builtin-target32.c: Likewise.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/torture/pr115387-2.c: Likewise.

gfortran.dg/gomp/metadirective-3.f90: xfail on offload_nvptx

Currently, 'target' with a nested metadirective creating a 'teams' will
fail with a bogus error ("‘target’ construct with nested ‘teams’ construct
contains directives outside of the ‘teams’ construct").
That's tracked at PR118694 - and, hence, expected.

However, the testcase metadirective-3.f90 triggers this when compiling for
'target offload_nvptx' (otherwise, the code is optimized away). Use xfail to
silence the error as it is known and there is a tracking PR.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/metadirective-3.f90: Add xfail when
compiling for offload_nvptx.

late-combine: Tighten register class check [PR108840]

gcc.target/aarch64/pr108840.c has failed since r15-268-g9dbff9c05520
(which means that I really ought to have looked at it earlier).

The test wants us to fold an SImode AND into all shifts that use it.
This is something that late-combine is supposed to do, but:

(1) the pre-RA pass chickened out because of a register pressure check

(2) the post-RA pass can't handle it, because the shift uses are in
    QImode and the sets are in SImode

Both are things that would be good to fix.  But (1) is particularly
silly.  The constraints on the AND have "rk" for the destination
(so allowing the stack pointer) and "r" for the first source.
Including the stack pointer made the destination seem more permissive
than the source.

The intention was instead to check whether there are any
*allocatable* registers in the destination class that aren't
present in the source.

That's enough for all tests but the last one.  The last one still
fails because combine merges the final shift with the move into
the hard return register, giving an arithmetic instruction with
a hard register destination.  Pre-RA late-combine currently punts
on those, again due to register pressure concerns.  That too is
something I'd like to relax, but not for GCC 15.  In the interim,
the best thing seems to be to disable combine for the test.

gcc/
PR rtl-optimization/108840
* late-combine.cc (late_combine::check_register_pressure):
Take only allocatable registers into account when checking
the permissiveness of register classes.

gcc/testsuite/
PR rtl-optimization/108840
* gcc.target/aarch64/pr108840.c: Run at -O2 but disable combine.

pair-fusion: Tweak wording in dump message [PR118320]

As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675978.html
this tweaks the dump messasge added with the fix for PR118320 since it doesn't
just apply to load pairs.

gcc/ChangeLog:

PR rtl-optimization/118320
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Tweak wording in dump
message when punting on invalid use arrays.

aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
generic_prefetch_tune in generic_armv8_a_tunings.

This patch updates the pointer to generic_armv8_a_prefetch_tune.

This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:

* config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
struct pointer.

tree-optimization/98845 - ICE with tail-merging and DCE/DSE disabled

The following shows that tail-merging will make dead SSA defs live
in paths where it wasn't before, possibly introducing UB or as
in this case, uses of abnormals that eventually fail coalescing
later. The fix is to register such defs for stmt comparison.

PR tree-optimization/98845
* tree-ssa-tail-merge.cc (stmt_local_def): Consider a
def with no uses not local.

* gcc.dg/pr98845.c: New testcase.
* gcc.dg/pr81192.c: Adjust.

RISC-V: Fix failed tests for regression due to fix ICE patch

Ref:
https://github.com/ewlu/gcc-precommit-ci/issues/3096#issue-2854419069

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-9.c: Added new failure check.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-17.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-18.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-19.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-20.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-21.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-22.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-23.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-24.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-25.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-26.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-27.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-28.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-29.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: Likewise.

RISC-V: Fix ICE for target attributes has different xlen size

This patch would like to avoid the ICE when the target attribute
specific the xlen different to the cmd. Aka compile with rv64gc
but target attribute with rv32gcv_zbb. For example as blow:

 1 │ long foo (long a, long b)
 2 │ __attribute__((target("arch=rv32gcv_zbb")));
 3 │
 4 │ long foo (long a, long b)
 5 │ {
 6 │ return a + (b * 2);
 7 │ }

when compile with rv64gc -O3, it will have ICE similar as below

during RTL pass: fwprop1
test.c: In function ‘foo’:
test.c:10:1: internal compiler error: in add_use, at
rtl-ssa/accesses.cc:1234
 10 | }
 | ^
0x44d6b9d internal_error(char const*, ...)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
0x44a26a6 fancy_abort(char const*, int, char const*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
0x408fac9 rtl_ssa::function_info::add_use(rtl_ssa::use_info*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/accesses.cc:1234
0x40a5eea
rtl_ssa::function_info::create_reg_use(rtl_ssa::function_info::build_info&,
rtl_ssa::insn_info*, rtl_ssa::resource_info)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/insns.cc:496
0x4456738
rtl_ssa::function_info::add_artificial_accesses(rtl_ssa::function_info::build_info&,
df_ref_flags)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:900
0x4457297
rtl_ssa::function_info::start_block(rtl_ssa::function_info::build_info&,
rtl_ssa::bb_info*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1082
0x4453627
rtl_ssa::function_info::bb_walker::before_dom_children(basic_block_def*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:118
0x3e9f3fb dom_walker::walk(basic_block_def*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:311
0x445806f rtl_ssa::function_info::process_all_blocks()
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1298
0x40a22d3 rtl_ssa::function_info::function_info(function*)
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/functions.cc:51
0x3ec3f80 fwprop_init
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:893
0x3ec420d fwprop
 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:963
0x3ec43ad execute

Consider stage 4, we just report error for the above scenario when
detect the cmd xlen is different to the target attribute during the
target hook TARGET_OPTION_VALID_ATTRIBUTE_P implementation.

PR target/118540

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Report error when cmd xlen is different with target attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118540-1.c: New test.
* gcc.target/riscv/rvv/base/pr118540-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

i386: Re-order i386.opt.urls

The order of i386.opt.urls need to be the same as i386.opt.

gcc/ChangeLog:

* config/i386/i386.opt.urls: Adjust the order for avx10.2
and avx10.2-512 due to their order change in i386.opt.

[testsuite] fix check-function-bodies usage

The existing usage comment for check-function-bodies is presumably a
typo, as it doesn't match existing uses. Fix it.

for gcc/testsuite/ChangeLog

* lib/scanasm.exp (check-function-bodies): Fix usage comment.

[ifcombine] cope with signbit tests of extended values

A compare with zero may be taken as a sign bit test by
fold_truth_andor_for_ifcombine, but the operand may be extended from a
narrower field.  If the operand was narrower, the bitsize will reflect
the narrowing conversion, but if it was wider, we'll only know whether
the field is sign- or zero-extended from unsignedp, but we won't know
whether it needed to be extended, because arg will have changed to the
narrower variable when we get to the point in which we can compute the
arg width.  If it's sign-extended, we're testing the right bit, but if
it's zero-extended, there isn't any bit we can test.

Instead of punting and leaving the foldable compare to be figured out
by another pass, arrange for the sign bit resulting from the widening
zero-extension to be taken as zero, so that the modified compare will
yield the desired result.

While at that, avoid swapping the right-hand compare operands when
we've already determined that it was a signbit test: it no use to even
try.

for  gcc/ChangeLog

PR tree-optimization/118805
* gimple-fold.cc (fold_truth_andor_for_combine): Detect and
cope with zero-extension in signbit tests.  Reject swapping
right-compare operands if rsignbit.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118805
* gcc.dg/field-merge-26.c: New.

Daily bump.

OpenMP/Fortran: extend 'adjust_args' clause, fixes for it and declare variant [PR115271]

On the extension side, it implements OpenMP 6.0's numeric values/ranges for
the adjust_args arguments, including 'omp_num_args'. And it adds parser
support for need_device_addr. It also implements the post-OpenMP-6.0
clarification of OpenMP spec Issue #4443 regarding type(c_ptr) with
dimension being invalid for need_device_ptr.

To be done: Adding full support for need_device_addr (optional, array
descriptor, ...).

On the invalid side, it removed a bogus c_ptr check that went through
all adjust_args without checking for need_device_ptr and the current scope.

And it finally also processes 'declare variant' in an INTERFACE block,
which is part of PR115271, but it does not handle .mod file yet - the
main issue tracked in that PR.

PR fortran/115271

gcc/fortran/ChangeLog:

* gfortran.h (gfc_omp_namelist): Change need_device_ptr to adj_args
union and add more flags.
* openmp.cc (gfc_match_omp_declare_variant,
gfc_resolve_omp_declare): For adjust_args, handle need_device_addr
and numeric values/ranges besides dummy argument names.
(resolve_omp_dispatch): Remove bogus a adjust_args check.
* trans-decl.cc (gfc_handle_omp_declare_variant): New.
(gfc_generate_module_vars, gfc_generate_function_code): Call it.
* trans-openmp.cc (gfc_trans_omp_declare_variant): Handle numeric
values/ranges besides dummy argument names.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/adjust-args-1.f90: Update dg-.* expectations.
* gfortran.dg/gomp/adjust-args-2.f90: Likewise.
* gfortran.dg/gomp/adjust-args-2a.f90: Likewise.
* gfortran.dg/gomp/adjust-args-3.f90: Likewise.
* gfortran.dg/gomp/adjust-args-4.f90: Remove array from c_ptr.
* gfortran.dg/gomp/adjust-args-5.f90: Likewise.
* gfortran.dg/gomp/adjust-args-11.f90: Likewise. Add check that
INTERFACE is now handled in subroutines and in modules.
* gfortran.dg/gomp/adjust-args-13.f90: New test.
* gfortran.dg/gomp/adjust-args-14.f90: New test.
* gfortran.dg/gomp/adjust-args-15.f90: New test.
* gfortran.dg/gomp/declare-variant-21.f90: New test.

i386: Simplify PARALLEL RTX scan in ix86_find_all_reg_use

UNSPEC and UNSPEC_VOLATILE never store. Remove unnecessary checks and
simplify RTX scan in ix86_find_all_reg_use to scan only for SET RTX
in the PARALLEL.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_find_all_reg_use):
Scan only for SET RTX in PARALLEL.

middle-end: Fixup constant integers when expanding __builtin_crc [PR118288]

Constant integers with MSB set have to be represented as corresponding
signed integers. Use gen_int_mode to emit them in the correct way.

PR middle-end/118288

gcc/ChangeLog:

* builtins.cc (expand_builtin_crc_table_based):
Use gen_int_mode to emit constant integers with MSB set.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118288.c: New test.

c++: add fixed test [PR102455]

Fixed by r13-4564 but the tests are very different.

PR c++/102455

gcc/testsuite/ChangeLog:

* g++.dg/ext/vector43.C: New test.

c++: extended temps and statement-exprs [PR118763]

My last patch for 118856 broke the test for 118763 (which my testing didn't
catch, for some reason), because it effectively reverted Jakub's recent fix
(r15-7415) for that bug.  It seems we need a new flag to indicate internal
temporaries.

In that patch Jakub wondered if other uses of CLEANUP_EH_ONLY would have the
same issue with jumps out of a statement-expr, and indeed it seems that
maybe_push_temp_cleanup and now set_up_extended_ref_temp have the same
problem.  Since maybe_push_temp_cleanup already uses a flag, we can easily
stop setting CLEANUP_EH_ONLY there as well.  Since set_up_extended_ref_temp
doesn't, working around this issue there will be more involved.

PR c++/118856
PR c++/118763

gcc/cp/ChangeLog:

* cp-tree.h (TARGET_EXPR_INTERNAL_P): New.
* call.cc (extend_temps_r): Check it instead of CLEANUP_EH_ONLY.
* tree.cc (get_internal_target_expr): Set it instead.
* typeck2.cc (maybe_push_temp_cleanup): Don't set CLEANUP_EH_ONLY.

gcc/testsuite/ChangeLog:

* g++.dg/ext/stmtexpr29.C: New test.

c++: add fixed test [PR96364]

We were rejecting this, but the test compiles correctly since r14-6346.

PR c++/96364

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/gen-attrs-88.C: New test.

tree-optimization/118895 - ICE during PRE

When we simplify a NARY during PHI translation we have to make sure
to not inject not available operands into it given that might violate
the valueization hook constraints and we'd pick up invalid
context-sensitive data in further simplification or as in this case
later ICE when we try to insert the expression.

PR tree-optimization/118895
* tree-ssa-sccvn.cc (vn_nary_build_or_lookup_1): Only allow
CSE if we can verify the result is available.

* gcc.dg/pr118895.c: New testcase.

AVR: ad target/118764 - Mention CVT availability in device-specs comment.

gcc/
PR target/118764
* config/avr/gen-avr-mmcu-specs.cc (print_mcu)
[has CVT]: Mention CVT in header comment of generated specs file.

gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

When making warnings trigger a failure in template substitution I
could not find any way to trigger the warning about builtin speculation
not being available on the given target.

Turns out I misread the code -- this warning happens when the
speculation_barrier pattern is not defined.

Here we add an effective target to represent
"__builtin_speculation_safe_value is available on this target" and use
that to adjust our test on SFINAE behaviour accordingly.
N.b. this means that we get extra testing -- not just that things work
on targets which support __builtin_speculation_safe_value, but also that
the behaviour works on targets which don't support it.

Tested with AArch64 native, AArch64 cross compiler, and RISC-V cross
compiler (just running the tests that I've changed).

Ok for trunk?

gcc/testsuite/ChangeLog:

PR target/117991
* g++.dg/template/builtin-speculation-overloads.def: SUCCESS
argument in SPECULATION_ASSERTS now uses a macro `true_def`
instead of the literal `true` for arguments which should work
with `__builtin_speculation_safe_value`.
* g++.dg/template/builtin-speculation-overloads1.C: Define
`true_def` macro on command line to compiler according to the
effective target representing that
`__builtin_speculation_safe_value` does something on this
target.
* g++.dg/template/builtin-speculation-overloads4.C: Likewise.
* lib/target-supports.exp
(check_effective_target_speculation_barrier_defined): New.

Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>

Avoid shift wider than unsigned HOST_WIDE_INT on unsigned integer exponentiation.

this patch is a variation of Jakub's patch in the PR, which
avoids overflow on the mask used for exponentiation and
fixes unsigned HOST_WIDE_INT. I tried testing this on
a POWER machine, but --with-build-config=bootstrap-ubsan
fails bootstrap there.

gcc/fortran/ChangeLog:

PR fortran/118862
* trans-expr.cc (gfc_conv_cst_int_power): Use functions for
unsigned wide integer.
(gfc_conv_cst_uint_power): Avoid generating the mask if it would
overflow an unsigned HOST_WIDE_INT. Format fixes.

i386: Regenerate i386.opt.urls

We need to regenerate i386.opt.urls after removing -mavx10.1.

gcc/ChangeLog:

* config/i386/i386.opt.urls: Regenetated.

i386: Re-alias avx10.2 to 512 bit and deprecate -mno-avx10.2-[256,512]

As mentioned in avx10.1 option deprecate patch, based on the feedback
we got, we would like to re-alias avx10.x to 512 bit.

For -mno- options, also mentioned in the previous patch, it is confusing
what it is disabling when it comes to avx10. So we will only provide
-mno-avx10.x options from AVX10.2, disabling the whole AVX10.x.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_UNSET): Adjust macro.
(OPTION_MASK_ISA2_AVX10_2_256_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX10_2_UNSET): New.
(ix86_handle_option): Remove disable part for avx10.2-256.
Rename avx10.2-512 switch case to avx10.2 and adjust disable
part macro.
* common/config/i386/i386-isas.h: Adjust avx10.2 and
avx10.2-512.
* config/i386/driver-i386.cc
(host_detect_local_cpu): Do not append -mno-avx10.x-256
for -march=native.
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Adjust avx10.2 and
avx10.2-512.
* config/i386/i386.opt: Reject Negative for mavx10.2-256.
Alias mavx10.2-512 to mavx10.2. Reject Negative for
mavx10.2-512.
* doc/extend.texi: Adjust documentation.
* doc/sourcebuild.texi: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c:
Add missing avx10_2_512 check.
* gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10-check.h: Change avx10.2 to avx10.2-256.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-comibf-1.c: Ditto.
* gcc.target/i386/avx10_2-comibf-2.c: Ditto.
* gcc.target/i386/avx10_2-comibf-3.c: Ditto.
* gcc.target/i386/avx10_2-comibf-4.c: Ditto.
* gcc.target/i386/avx10_2-compare-1.c: Ditto.
* gcc.target/i386/avx10_2-compare-1b.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avx10_2-minmax-1.c: Ditto.
* gcc.target/i386/avx10_2-movrs-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-vaddbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcmpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcomisbf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vcomisbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2usis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2usis-2.c: Ditto.
* gcc.target/i386/avx10_2-vdivbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfpclassbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetexpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetmantbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmaxbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
* gcc.target/i386/avx10_2-vmovd-1.c: Ditto.
* gcc.target/i386/avx10_2-vmovd-2.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-1.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vmulbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vrcpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vreducebf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrndscalebf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrsqrtbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vscalefbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsqrtbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsubbf16-2.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Ditto.
* gcc.target/i386/part-vect-vec_cmpbf.c: Ditto.
* gcc.target/i386/pr117495.c: Ditto.
* gcc.target/i386/sm4-avx10_2-1.c: Ditto.
* gcc.target/i386/sm4-check.h: Ditto.
* gcc.target/i386/vnniint16-auto-vectorize-3.c: Ditto.
* gcc.target/i386/vnniint8-auto-vectorize-3.c: Ditto.
* lib/target-supports.exp: Ditto.

i386: Deprecate -m[no-]avx10.1 and make -mno-avx10.1-512 to disable the whole AVX10.1

Based on the feedback we got, we would like to re-alias avx10.x to 512
bit in the future. This leaves the current avx10.1 alias to 256 bit
inconsistent. Since it has been there for GCC 14.1 and GCC 14.2,
we decide to deprecate avx10.1 alias. The current proposal is not
adding it back in the future, but it might change if necessary.

For -mno- options, it is confusing what it is disabling when it comes
to avx10. Since there is barely usage enabling AVX10 with 512 bit
then disabling it, we will only provide -mno-avx10.x options in the
future, disabling the whole AVX10.x. If someone really wants to disable
512 bit after enabling it, -mavx10.x-512 -mno-avx10.x -mavx10.x-256 is
the only way to do that since we also do not want to break the usual
expression on -m- options enabling everything mentioned.

However, for avx10.1, since we deprecated avx10.1, there is no reason
we should have -mno-avx10.1. Thus, we need to keep -mno-avx10.1-[256,512].
To avoid confusion, we will make -mno-avx10.1-512 to disable the
whole AVX10.1 set to match the future -mno-avx10.x.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX2_UNSET): Change AVX10.1 unset macro.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_1_UNSET): New.
(ix86_handle_option): Adjust AVX10.1 unset macro.
* common/config/i386/i386-isas.h: Remove avx10.1.
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Adjust warning message.
* config/i386/i386.opt: Remove mavx10.1.
* doc/extend.texi: Remove avx10.1 and adjust doc.
* doc/sourcebuild.texi: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10-check.h: Change to avx10.1-256.
* gcc.target/i386/avx10_1-1.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-14.c: Ditto.
* gcc.target/i386/avx10_1-21.c: Ditto.
* gcc.target/i386/avx10_1-22.c: Ditto.
* gcc.target/i386/avx10_1-23.c: Ditto.
* gcc.target/i386/avx10_1-24.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-5.c: Ditto.
* gcc.target/i386/avx10_1-6.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/pr117946.c: Ditto.
* gcc.target/i386/avx10_1-12.c: Adjust warning message.
* gcc.target/i386/avx10_1-19.c: Ditto.
* gcc.target/i386/avx10_1-17.c: Adjust to no-avx10.1-512.

i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

When AVX512 is not explicitly set, we should not take EVEX512 bit into
consideration when checking vector size. It will solve the intrin header
file reporting warnings when compiling with -Wsystem-headers.

However, there is side effect on the usage for '-march=xxx -mavx10.1-256',
where xxx is with AVX512. It will not report warning on vector size for now.
Since it is a rare usage, we will take it.

gcc/ChangeLog:

PR target/118815
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not check vector size conflict when AVX512 is not explicitly
set.

gcc/testsuite/ChangeLog:

PR target/118815
* gcc.target/i386/pr118815.c: New test.

LoongArch: Fix the issue of function jump out of range caused by crtbeginS.o [PR118844].

Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld will not fail because the order of the two is different).

The linking order:
lld: crtbeginS.o + .text + .plt
ld : .plt + crtbeginS.o + .text

To solve this issue, add '-mcmodel=extreme' when compiling crtbeginS.o.

PR target/118844

libgcc/ChangeLog:

* config/loongarch/t-crtstuff: Add '-mcmodel=extreme'
to CRTSTUFF_T_CFLAGS_S.

Daily bump.

[PR target/118248] Avoid bogus alloca call in RISC-V backend

This is Jakub's patch and Ian's testcase for the slightly vexing fault building
the D runtime with an s390x-x-riscv cross compiler.

The core issue is we're allocating a vector to hold temporary registers
unconditionally, including cases where the vector isn't needed because the loop
isn't going to iterate.

In the cases where the vector isn't needed the length is computed with an
expression (x / y) - 1 where x / y will be zero.  The alloca(-1) on the s390
platform triggers a fault.  We haven't seen the fault with an x86 cross, but we
can certainly see the bogus value being passed to alloca with a debugger.

Jakub patch just conditionalizes the whole block in a sensible way.  So it
looks larger than it really is.  I thought it might be better to do a bit of
manual CSE on this code to make it even more obvious, but I think we're
ultimately OK here.

Ian provided the testcase, collapsed down into equivalent C code. Again, it
doesn't fault on an x86-x-riscv, but I can see the incorrect behavior with a
debugger.

And a shout-out to Stefan for providing a docker based reproducer, it really
helped track this down.

PR target/118248
gcc/
* config/riscv/riscv-string.cc (riscv_block_move_straight): Only
allocate REGS buffer if it will be needed.

gcc/testsuite
* gcc.target/riscv/pr118248.c: New test.

AVR: ad target/118764 - Let -mcvt set built-in macro __AVR_CVT__

gcc/
PR target/118764
* config/avr/avr-c.cc (avr_cpu_cpp_builtins)
[TARGET_CVT]: Define __AVR_CVT__.
* doc/invoke.texi (AVR Built-in Macros): Document __AVR_CVT__.

AVR: Don't asm output operations for unused result bytes.

When REG_UNUSED notes indicate that some result bytes are not
used by the following code, then there's no need to asm out them.
The patch uses such notes for the asm out of AND, IOR, XOR, PLUS, MINUS.

gcc/
* config/avr/avr.cc (avr_result_regno_unused_p): New static function.
(avr_out_bitop): Only output result bytes that are used.
(avr_out_plus_1): Same.

AVR: Diagnose unsupported built-ins in avr_resolve_overloaded_builtin.

This patch executes avr_builtin_supported_p at a later time and in
avr_resolve_overloaded_builtin. This allows for better diagnostics
and avoids lto1 hiccups when a built-in decl is NULL_TREE.

gcc/
* config/avr/avr-protos.h (avr_builtin_supported_p): Remove.
* config/avr/avr.cc (avr_init_builtins): Don't initialize
non-available built-ins with NULL_TREE.
(avr_builtin_supported_p): Move to...
* config/avr/avr-c.cc: ...here.
(avr_resolve_overloaded_builtin): Run avr_builtin_supported_p.

Remove double output of attr->save.

In the recent patch for dumping all attributes, there were
duplicates for attr->save, which is output via gfc_code2string
previously. This patch removes that double output.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_attr): Remove double output
of attr->save.

c++: Add testcase for now fixed issue [PR117324]

The case in this PR does not ICE anymore after the fix for PR118319.

This patch simply adds the case to the testsuite.

PR c++/117324

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg19.C: New test.

x86: Properly find the maximum stack slot alignment

Don't assume that stack slots can only be accessed by stack or frame
registers. We first find all registers defined by stack or frame
registers. Then check memory accesses by such registers, including
stack and frame registers.

gcc/

PR target/109780
PR target/109093
* config/i386/i386.cc (ix86_update_stack_alignment): New.
(ix86_find_all_reg_use_1): Likewise.
(ix86_find_all_reg_use): Likewise.
(ix86_find_max_used_stack_alignment): Also check memory accesses
from registers defined by stack or frame registers.

gcc/testsuite/

PR target/109780
PR target/109093
* g++.target/i386/pr109780-1.C: New test.
* gcc.target/i386/pr109093-1.c: Likewise.
* gcc.target/i386/pr109780-1.c: Likewise.
* gcc.target/i386/pr109780-2.c: Likewise.
* gcc.target/i386/pr109780-3.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

[PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

So this is a fairly old regression, but with all the ranger work that's been
done, it's become easy to resolve.

The basic idea here is to use known relationships between two operands of a
SUB_OVERFLOW IFN to statically compute the overflow state and ultimately allow
turning the IFN into simple arithmetic (or for the tests in this BZ elide the
arithmetic entirely).

The regression example is when the two inputs are known equal. In that case
the subtraction will never overflow. But there's a few other cases we can
handle as well.

a == b -> never overflows
a > b -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a always overflows when A and B are unsigned

Bootstrapped and regression tested on x86, and regression tested on the usual
cross platforms.

This is Jakub's version of the vr-values.cc fix rather than Jeff's.

PR tree-optimization/98028
gcc/
* vr-values.cc (check_for_binary_op_overflow): Try to use a known
relationship betwen op0/op1 to statically determine overflow state.

gcc/testsuite
* gcc.dg/tree-ssa/pr98028.c: New test.

Fortran: passing of derived type to VALUE,OPTIONAL dummy argument [PR118080]

For scalar OPTIONAL dummy arguments with the VALUE attribute, gfortran
passes a hidden flag to denote presence or absence of the actual argument
for intrinsic types. Extend this treatment to derived type (user-defined
as well as from intrinsic module ISO_C_BINDING).

PR fortran/118080

gcc/fortran/ChangeLog:

* gfortran.texi: Adjust documentation.
* trans-decl.cc (create_function_arglist): Adjust to pass hidden
presence flag also for derived type dummies with VALUE,OPTIONAL
attribute.
* trans-expr.cc (gfc_conv_expr_present): Expect hidden presence
flag also for derived type dummies with VALUE,OPTIONAL attribute.
(conv_cond_temp): Adjust to allow derived types.
(conv_dummy_value): Extend to handle derived type dummies with
VALUE,OPTIONAL attribute.
(gfc_conv_procedure_call): Adjust for actual arguments passed to
derived type dummies with VALUE,OPTIONAL attribute.
* trans-types.cc (gfc_get_function_type): Adjust fndecl for
hidden presence flag.

gcc/testsuite/ChangeLog:

* gfortran.dg/value_optional_2.f90: New test.

Fortran: gfortran allows type(C_ptr) in I/O list

Before this patch, gfortran was accepting invalid use of
type(c_ptr) in I/O statements. The fix affects several
existing test cases so no new test case needed.

Existing tests were modified to pass by either using the
transfer function to convert to an acceptable value or
using an assignment to a like type (non-I/O).

PR fortran/117430

gcc/fortran/ChangeLog:

* resolve.cc (resolve_transfer): Change gfc_notify_std to
gfc_error.

gcc/testsuite/ChangeLog:

* gfortran.dg/c_loc_test_17.f90: Use an assignment rather than
PRINT.
* gfortran.dg/c_ptr_tests_10.f03: Use a transfer function.
* gfortran.dg/c_ptr_tests_16.f90: Use an assignment.
* gfortran.dg/c_ptr_tests_9.f03: Use a transfer function.
* gfortran.dg/init_flag_17.f90: Likewise.
* gfortran.dg/pr32601_1.f03: Use an assignment.

[PATCH] RISC-V: Fix some widen-complicate tests.

this patch adds two bridge patterns for combine in order to fix the
widen-complicate tests that regressed since GCC 14.

They key is to allow combination with a ephemeral binary-operation insn
that widens its first operand. This can subsequently be combined two
a double-widening insn. If the combination doesn't happen we fall back
to the original non-combined two insns. We have been doing the same
thing for multiply-add patterns for a while.

There are still remaining tests of a similar kind that fail. The issue
there is indeed that combine (now) lacks the capability to continue in
cases where there is no apparent local progress but two or three
seemingly no-progress combinations would help us get a better "global" result.

late_combine cannot rescue us here either because it only performs a
propagation if the source can be propagated into all its uses which is
not the case here. What we might need to do is create internal functions
for those widening operations and combine/simplify at gimple-level already.

I have done testing on rv64gcv_zvl512b but on an older local commit.
Curious what the CI has to say about it now.

Regards
Robin

gcc/ChangeLog:

* config/riscv/autovec-opt.md
(*single_widen_first_<any_widen_binop:optab><any_extend:su><mode>):
New combine "bridge" pattern.

[PATCH] rx: allow cmpstrnsi len to be zero

The SCMPU instruction doesn't change the C and Z flags when the
incoming length is zero, which means the insn will produce a
value based upon the existing flag values.

As a quick kludge, adjust these flags to ensure a zero result in this
case.

gcc/
* config/rx/rx.md (rx_cmpstrn): Correctly handle len=0 case.

[PATCH] RISC-V: testsuite: Adjust pr117722.c scan.

the test scanned for vmin and vmax instead of vminu and vmaxu.
This patch fixes that.

Will commit as obvious once the CI is OK with it.

Regards
Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117722.c: Scan for vminu and
vmaxu.

RISC-V: testsuite: Fix reduc-[89].c again.

my last fix wasn't sufficient. This patch just scans for the scalar
insns now.

Going to commit as obvious if the CI is happy.

Regards
Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Scan for add.
* gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Scan for fadd.

sarif-replay: handle the 'fixes' property (§3.27.30)

This adds support to sarif-replay to display fix-it hints
stored in GCC's SARIF output.

gcc/ChangeLog:
* libsarifreplay.cc (sarif_replayer::handle_result_obj): Call
handle_fix_object if we see a single-element "fixes" array.
(sarif_replayer::handle_fix_object): New.
(sarif_replayer::handle_artifact_change_object): New.

gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-valid/3.27.30-fixes-1.sarif: New test.
* sarif-replay.dg/2.1.0-valid/3.27.30-fixes-2.sarif: New test.
* sarif-replay.dg/2.1.0-valid/3.27.30-fixes-3.sarif: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif-replay: don't add trailing " [error]"

Our SARIF output supplies "error" for the rule ID for DK_ERROR,
since a value is required, but it's not useful to print in sarif-replay.

Filter it out.

gcc/ChangeLog:
* libsarifreplay.cc (should_add_rule_p): New.
(sarif_replayer::handle_result_obj): Use it to filter out rules
that don't make sense.

gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-valid/3.28.6-annotations-1.sarif: Update
expected output to remove trailing " [error]".
* sarif-replay.dg/2.1.0-valid/unlabelled-secondary-locations.sarif:
Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif-replay: handle relatedLocations without messages (§3.27.22)

Given the .sarif output from e.g.:

 too-many-arguments.c: In function 'test_known_fn':
 too-many-arguments.c:14:3: error: too many arguments to function 'fn_a'; expected 0, have 1
 14 | fn_a (42);
 | ^~~~ ~~
 too-many-arguments.c:8:13: note: declared here
 8 | extern void fn_a ();
 | ^~~~

the underlining of the stray argument (the "42") is captured in
'relatedLocations' (§3.27.22) but previously sarif-replay would not
display this:

 In function 'test_known_fn':
 too-many-arguments.c:14:3: error: too many arguments to function 'fn_a'; expected 0, have 1 [error]
 14 | fn_a (42);
 | ^~~~
 too-many-arguments.c:8:13: note: declared here
 8 | extern void fn_a ();
 | ^~~~

With this patch sarif-replay handles the relatedLocations element as
a secondary location in libgdiagnostics, and correctly underlines
the "42":

 In function 'test_known_fn':
 too-many-arguments.c:14:3: error: too many arguments to function 'fn_a'; expected 0, have 1 [error]
 14 | fn_a (42);
 | ^~~~ ~~
 too-many-arguments.c:8:13: note: declared here
 8 | extern void fn_a ();
 | ^~~~

gcc/ChangeLog:
* libsarifreplay.cc (sarif_replayer::handle_result_obj): Treat any
relatedLocations without messages as secondary ranges within the
diagnostic. Doing so requires stashing the notes until after
the diagnostic has been finished, so that relatedLocations can be
walked in one pass.

gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-valid/unlabelled-secondary-locations.sarif:
New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif-replay: display annotations as labelled ranges (§3.28.6) [PR118881]

In our .sarif output from e.g.:

 bad-binary-op.c: In function ‘test_4’:
 bad-binary-op.c:19:23: error: invalid operands to binary + (have ‘S’ {aka ‘struct s’} and ‘T’ {aka ‘struct t’})
 19 | return callee_4a () + callee_4b ();
 | ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
 | | |
 | | T {aka struct t}
 | S {aka struct s}

the labelled ranges are captured in the 'annotations' property of the
'location' object (§3.28.6).

However sarif-replay emits just:

 In function 'test_4':
 bad-binary-op.c:19:23: error: invalid operands to binary + (have ‘S’ {aka ‘struct s’} and ‘T’ {aka ‘struct t’}) [error]
 19 | return callee_4a () + callee_4b ();
 | ^

missing the labelled ranges.

This patch adds support to sarif-replay for the 'annotations' property;
with this patch we emit:

 In function 'test_4':
 bad-binary-op.c:19:23: error: invalid operands to binary + (have ‘S’ {aka ‘struct s’} and ‘T’ {aka ‘struct t’}) [error]
 19 | return callee_4a () + callee_4b ();
 | ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
 | | |
 | | T {aka struct t}
 | S {aka struct s}

thus showing the labelled ranges.

Doing so requires adding a new entrypoint to libgdiagnostics:
 diagnostic_physical_location_get_file
Given that we haven't yet released a stable version and that
sarif-replay is built together with libgdiagnostics I didn't
bother updating the ABI version.

gcc/ChangeLog:
PR sarif-replay/118881
* doc/libgdiagnostics/topics/physical-locations.rst: Add
diagnostic_physical_location_get_file.
* libgdiagnostics++.h (physical_location::get_file): New wrapper.
(diagnostic::add_location): Likewise.
* libgdiagnostics.cc (diagnostic_manager::get_file_by_name): New.
(diagnostic_physical_location::get_file): New.
(diagnostic_physical_location_get_file): New.
* libgdiagnostics.h (diagnostic_physical_location_get_file): New.
* libgdiagnostics.map (diagnostic_physical_location_get_file): New.
* libsarifreplay.cc (class annotation): New.
(add_any_annotations): New.
(sarif_replayer::handle_result_obj): Collect vectors of
annotations in the calls to handle_location_object and apply them
to "err" and to "note" as appropriate.
(sarif_replayer::handle_thread_flow_location_object): Pass nullptr
for annotations.
(sarif_replayer::handle_location_object): Handle §3.28.6
"annotations" property, using it to populate a new
"out_annotations" param.

gcc/testsuite/ChangeLog:
PR sarif-replay/118881
* sarif-replay.dg/2.1.0-valid/3.28.6-annotations-1.sarif: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++/modules: Don't treat template parameters as TU-local [PR118846]

There are two separate issues making various template parameters behave
as if they were TU-local.

Firstly, the TU-local detection code uses WILDCARD_TYPE_P to check for
types that are not yet concrete; for some reason UNBOUND_CLASS_TEMPLATE
is not on that list. I don't see any particular reason why it shouldn't
be, so this patch adds it; this may solve other latent issues as well.

Secondly, the TEMPLATE_DECL for a type with expressions involving
TEMPLATE_TEMPLATE_PARM_Ps is currently always constrained to internal
linkage, because the result does not have TREE_PUBLIC set. Rather than
messing with TREE_PUBLIC here, I think rather we just should ensure that
we only attempt to constrain visiblity of templates of type, variable,
or function decls.

PR c++/118846

gcc/cp/ChangeLog:

* cp-tree.h (WILDCARD_TYPE_P): Include UNBOUND_CLASS_TEMPLATE.
* decl2.cc (min_vis_expr_r): Don't assume a TEMPLATE_DECL will
be a function or variable.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118846_a.C: New test.
* g++.dg/modules/pr118846_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

testsuite: tweak constexpr-lamda1.C [PR118053]

I forgot to add the -O necessary to reproduce the bug before pushing the
fix.

PR c++/118053

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lambda1.C: Add -O.

c++: NRVO, constexpr, lambda [PR118053]

Here during constant evaluation we encounter a VAR_DECL with DECL_VALUE_EXPR
of the RESULT_DECL, where the latter has been adjusted for
pass-by-invisible-reference. We already had the code to deal with this, we
just need to use it in the non-capture case of DECL_VALUE_EXPR as well.

PR c++/118053

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Generalize
DECL_VALUE_EXPR invisiref handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lambda1.C: New test.

Remove defunct web site for site of Fortran preprocessor.

gcc/fortran/ChangeLog:

PR fortran/118159
* invoke.texi: Remove mention of defunct web site for Coco.

libstdc++: Remove unused header from <bits/shared_ptr_base.h>

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h: Do not include <bit>.

libstdc++: Simplify and comment std::jthread extension [PR100612]

Use a requires-clause on the partial specialization of the
__pmf_expects_stop_token variable template, which is used for the
extension that allows constructing std::jthread with a
pointer-to-member-function that accepts a std::stop_token argument.

Also add a comment referring to the related Bugzilla PR.

libstdc++-v3/ChangeLog:

PR libstdc++/100612
* include/std/thread (__pmf_expects_stop_token): Constrain
variable template specialization with concept. Add comment.