git.ipfire.org Git - thirdparty/gcc.git/log

ada: Fix pragma Compile_Time_Error for alignment of array types

The pragma is consistenly rejected for the alignment of array types because
Eval_Attribute does not evaluate it even if it is known.

gcc/ada/

* sem_attr.adb (Eval_Attribute): Treat Alignment like Component_Size
for array types.

ada: Enable casing on composite via -X0 instead of -X

Move case statement pattern matching out of the curated language extension
set and into the extended set.

gcc/ada/

* sem_case.adb: Replace all tests of Core_Extensions_Allowed with
corresponding tests of All_Extensions_Allowed.
* sem_ch5.adb: Likewise.
* doc/gnat_rm/gnat_language_extensions.rst: update documentation.
* gnat_rm.texi: Regenerate.

ada: Fix internal error with Put_Image aspect on access-to-class-wide type

This occurs with an instantiation of Ada.Containers.Vectors in a nested
package on an access-to-class-wide type declared with the Put_Image aspect
because of too late a freezing for the internal renaming generated for the
Put_Image procedure.

The change freezes this renaming immediately in this particular case; this
is similar to a trick used in Build_Array_Put_Image_Procedure.

gcc/ada/

* sem_ch13.adb (New_Put_Image_Subprogram): In the nondeferred case
coming from an aspect and for a type with delaying freezing, also
freeze the subprogram immediately.

ada: Simplify uses of readdir_gnat with object overlay

Code cleanup; behavior is unaffected.

gcc/ada/

* libgnat/a-direct.adb (Start_Search_Internal): Combine subtype
and object declaration.
* libgnat/g-dirope.adb (Read): Replace convoluted unchecked
conversion with an overlay.

ada: Refactor GNAT.Directory_Operations.Read to minimise runtime checks

Array assignments are likely more efficient than element-by-element
copying; in particular, they avoid constraints checks in every iteration
of a loop (when the runtime is compiled with checks enabled).

A cleanup and improvement opportunity spotted while working on improved
detection of uninitialised local scalar objects.

gcc/ada/

* libgnat/g-dirope.adb (Read): Use null-excluding,
access-to-constant type; replace element-by-element copy with
array assignments.

ada: Compiler crash on nonstatic container aggregates for Doubly_Linked_Lists

The compiler was crashing on container aggregates for the List type
coming from an instantiation of Ada.Containers.Doubly_Linked_Lists
when the aggregate has more than one iterated_element_association
with nonstatic range bounds. As part of addressing this, it was
noticed that there were also somewhat related problems with container
aggregates associated with the Ada.Containers.Bounded_Doubly_Linked_Lists
generic (and likely others like it) and mishandling of certain cases of
indexed aggregates, and those are also addressed by this set of changes.
In the case of container aggregates with at least one association with
a nonstatic range, the total length of the aggregate is determined by
expansion actions of Aggregate_Size.

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate): Move determination of
whether the aggregate is an indexed aggregate earlier in the
procedure. Test Is_Indexed_Aggregate as a criterion for generating
a call to the container type's New_Indexed function, add proper
computation of bounds to pass in to the function, and remove later
code for generating such a call. Add and improve comments.
(Aggregate_Size): Remove special treatment of case where there is
exactly one component association, and instead loop over all
component associations to determine whether any of them have a
nonstatic length. If there is at least one such nonstatic
association, return -1.
(Build_Siz_Exp): Accumulate a sum of the sizes of each of the
component associations in Siz_Exp (which will only be used if
there any associations that are of Nkind
N_Iterated_Component_Association with a nonstatic range).
(Expand_Range_Component): Fix typos in the procedure's spec
comment and block comment.

Fortran: Fix wrong code in unlimited polymorphic assignment [PR113363]

2024-05-13 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/113363
* trans-array.cc (gfc_array_init_size): Use the expr3 dtype so
that the correct element size is used.
* trans-expr.cc (gfc_conv_procedure_call): Remove restriction
that ss and ss->loop be present for the finalization of class
array function results.
(trans_class_assignment): Use free and malloc, rather than
realloc, for character expressions assigned to unlimited poly
entities.
* trans-stmt.cc (gfc_trans_allocate): Build a correct rhs for
the assignment of an unlimited polymorphic 'source'.

gcc/testsuite/
PR fortran/113363
* gfortran.dg/pr113363.f90: New test.

Revert "MIPS: Support constraint 'w' for MSA instruction"

This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.

It is not needed at all:
asm volatile ("fmadd.d %w0, %1, %2" : "+f"(a) : "f"(b), "f"(c))
is OK for us.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself.

Fortran: fix frontend memleak

gcc/fortran/ChangeLog:

* primary.cc (gfc_match_varspec): Replace 'ref' argument to
is_inquiry_ref() by NULL when the result is not needed to avoid
a memleak.

arm: Use utxb rN, rM, ror #8 to implement zero_extract on armv6.

Examining the code generated for the following C snippet on a
raspberry pi:

int popcount_lut8(unsigned *buf, int n)
{
  int cnt=0;
  unsigned int i;
  do {
    i = *buf;
    cnt += lut[i&255];
    cnt += lut[i>>8&255];
    cnt += lut[i>>16&255];
    cnt += lut[i>>24];
    buf++;
  } while(--n);
  return cnt;
}

I was surprised to see following instruction sequence generated by the
compiler:

  mov    r5, r2, lsr #8
  uxtb   r5, r5

This sequence can be performed by a single ARM instruction:

  uxtb   r5, r2, ror #8

The attached patch allows GCC's combine pass to take advantage of ARM's
uxtb with rotate functionality to implement the above zero_extract, and
likewise to use the sxtb with rotate to implement sign_extract.  ARM's
uxtb and sxtb can only be used with rotates of 0, 8, 16 and 24, and of
these only the 8 and 16 are useful [ror #0 is a nop, and extends with
ror #24 can be implemented using regular shifts],  so the approach here
is to add the six missing but useful instructions as 6 different
define_insn in arm.md, rather than try to be clever with new predicates.

Later ARM hardware has advanced bit field instructions, and earlier
ARM cores didn't support extend-with-rotate, so this appears to only
benefit armv6 era CPUs (e.g. the raspberry pi).

Patch posted:
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01339.html
Approved by Kyrill Tkachov:
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01881.html

2024-05-12  Roger Sayle  <roger@nextmovesoftware.com>
    Kyrill Tkachov  <kyrylo.tkachov@foss.arm.com>

* config/arm/arm.md (*arm_zeroextractsi2_8_8, *arm_signextractsi2_8_8,
*arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16,
*arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New.

2024-05-12  Roger Sayle  <roger@nextmovesoftware.com>
    Kyrill Tkachov  <kyrylo.tkachov@foss.arm.com>

* gcc.target/arm/extend-ror.c: New test.

[to-be-committed,RISC-V] Improve usage of slli.uw in constant synthesis

And an improvement to using slli.uw...

I recently added the ability to use slli.uw in the synthesis path. That
code was conditional on the right justified constant being a LUI_OPERAND
after sign extending from bit 31 to bit 63.

That code is working fine, but could be improved. Specifically there's
no reason it shouldn't work for LUI+ADDI under the same circumstances.
So rather than testing the sign extended, right justified constant is a
LUI_OPERAND, we can just test that the right justified constant has
precisely 32 leading zeros.

gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Use slli.uw more.

gcc/testsuite
* gcc.target/riscv/synthesis-5.c: New test.

[to-be-committed] RISC-V Fix minor regression in synthesis WRT bseti usage

Overnight testing showed a small number of cases where constant synthesis was
doing something dumb.  Specifically generating more instructions than the
number of bits set in the constant.

It was a minor goof in the recent bseti code.  In the code to first figure out
what bits LUI could set, I included one bit outside the space LUI operates.
For some dumb reason I kept thinking in terms of 11 low bits belonging to addi,
but it's actually 12 bits.  The net is what we thought should be a single LUI
for costing turned into LUI+ADDI.

I didn't let the test run to completion, but over the course of 12 hours it
found 9 cases.  Given we know that the triggers all have 0x800 set, I bet we
could likely find more, but I doubt it's that critical to cover every possible
constant that regressed.

gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Fix thinko in testing
when lui can be used to set several bits in bseti path.

gcc/testsuite

* gcc.target/riscv/synthesis-4.c: New test

Regenerate cygming.opt.urls and mingw.opt.urls

The new cygming.opt.urls and mingw.opt.urls in the
gcc/config/mingw/cygming.opt.urls directory need to generated by make
regenerate-opt-urls in the gcc subdirectory. They still contained
references to the gcc/config/i386 directory from which they were
copied.

Fixes: 1f05dfc131c7 ("Reuse MinGW from i386 for AArch64")
Fixes: e8d003736e6c ("Rename "x86 Windows Options" to "Cygwin and MinGW Options"")
gcc/ChangeLog:

* config/mingw/cygming.opt.urls: Regenerate.
* config/mingw/mingw.opt.urls: Likewise.

Fortran: Unlimited polymorphic intrinsic function arguments [PR84006]

2024-05-12 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84006
PR fortran/100027
PR fortran/98534
* iresolve.cc (gfc_resolve_transfer): Emit a TODO error for
unlimited polymorphic mold.
* trans-expr.cc (gfc_resize_class_size_with_len): Use the fold
even if a block is not available in which to fix the result.
(trans_class_assignment): Enable correct assignment of
character expressions to unlimited polymorphic variables using
lhs _len field and rse string_length.
* trans-intrinsic.cc (gfc_conv_intrinsic_storage_size): Extract
the class expression so that the unlimited polymorphic class
expression can be used in gfc_resize_class_size_with_len to
obtain the storage size for character payloads. Guard the use
of GFC_DECL_SAVED_DESCRIPTOR by testing for DECL_LANG_SPECIFIC
to prevent the ICE. Also, invert the order to use the class
expression extracted from the argument.
(gfc_conv_intrinsic_transfer): In same way as 'storage_size',
use the _len field to obtaining the correct length for arg 1.
Add a branch for the element size in bytes of class expressions
with provision to make use of the unlimited polymorphic _len
field. Again, the class references are explicitly identified.
'mold_expr' was already declared. Use it instead of 'arg'. Do
not fix 'dest_word_len' for deferred character sources because
reallocation on assign makes use of it before it is assigned.

gcc/testsuite/
PR fortran/84006
PR fortran/100027
* gfortran.dg/storage_size_7.f90: New test.

PR fortran/98534
* gfortran.dg/transfer_class_4.f90: New test.

Fortran: fix dependency checks for inquiry refs [PR115039]

gcc/fortran/ChangeLog:

PR fortran/115039
* expr.cc (gfc_traverse_expr): An inquiry ref does not constitute
a dependency and cannot collide with a symbol.

gcc/testsuite/ChangeLog:

PR fortran/115039
* gfortran.dg/statement_function_5.f90: New test.

[PATCH v4 4/4] Output S_COMPILE3 symbol in CodeView debug section

Outputs the S_COMPILE3 symbol in the CodeView .debug$S debug section.
The DEBUG_S_SYMBOLS block added here makes up pretty much everything
that isn't data structures or line numbers; we add the S_COMPILE3 symbol
here to start it off.

This is a descriptive bit, the most interesting part of which is the
version of the compiler used.

gcc/
* dwarf2codeview.cc (DEBUG_S_SYMBOLS): Define.
(S_COMPILE3, CV_CFL_80386, CV_CFL_X64): Likewise.
(CV_CFL_C, CV_CFL_CXX): Likewise.
(SYMBOL_START_LABEL, SYMBOL_END_LABEL): Likewise.
(start_processor, language_constant): New functions.
(write_compile3_symbol, write_codeview_symbols): Likewise.
(codeview_debug_finish): Call write_codeview_symbols.

[PATCH v2 3/4] Output line numbers in CodeView section

Outputs the DEBUG_S_LINES block in the CodeView .debug$S section, which
maps between line numbers and addresses.

You'll need a fairly recent version of GAS for the .secidx directive to
be recognized.

gcc/
* dwarf2codeview.cc (DEBUG_S_LINES, LINE_LABEL): Define.
(END_FUNC_LABEL): Likewise.
(struct codeview_line, codeview_line_block): New structures.
(codeview_function): Likewise.
(line_label_num, func_label_num, funcs, last_func): New variables.
(last_filename, last_file_id): Likewise.
(codeview_source_line, write_line_numbers): New functions.
(codeview_switch_text_section, codeview_end_epilogue): Likewise.
(codeview_debug_finish): Call write_line_numbers.
* dwarf2codeview.h (codeview_source_line): Prototype.
(codeview_switch_text_secction, codeview_end_epilogue): Likewise.
* dwarf2out.cc (dwarf2_end_epilogue): Add codeview support.
(dwarf2out_switch_text_section): Likewise.
(dwarf2out_source_line): Likewise.
* opts.cc (finish_options): Handle codeview debugging symbols.

[PATCH v2 2/4] Output file checksums in CodeView section

Outputs the file name and MD5 hash of the main source file into the
CodeView .debug$S section, along with that of any #include'd files.

gcc/
* dwarf2codeview.cc (DEBUG_S_STRINGTABLE): Define.
(DEBUG_S_FILECHKSMS, CHKSUM_TYPE_MD5, HASH_SIZE): Likewise.
(codeview_string, codeview_source_file): New structures.
(struct string_hasher): New class for codeview_string hashing.
(files, last_file, num_files, string_offset): New variables.
(strings_hstab, strings, last_string): Likewise.
(add_string, codevie_start_source_file): New functions.
(write_strings_tabe, write_soruce_files): Likewise.
(codeview_debug_finish): Call new functions.
* dwarf2codeview.h (codeview_start_source_file): Prototype.
* dwarf2out.cc (dwarf2out_start_source_file): Handle codeview.

[PATCH v2 1/4] Support for CodeView debugging format

This patch and the following add initial support for Microsoft's
CodeView debugging format, as used by MSVC, to mingw targets.

Note that you will need a recent version of binutils for this to be
useful. The best way to view the output is to run Microsoft's
cvdump.exe, found in their microsoft-pdb repo on GitHub, against the
object files.

gcc/

* Makefile.in (OBJS): Add dwarf2codeview.o.
(GTFILES): Add dwarf2codeview.cc
* config/i386/cygming.h (CODEVIEW_DEBUGGING_INFO): Define.
* dwarf2codeview.cc: New file.
* dwarf2codeview.h: New file.
* dwarf2out.cc: Include dwarf2codeview.h.
(dwarf2out_finish): Call codeview_debug_finish as needed.
* flag-types.h (DINFO_TYPE_CODEVIEW): Add enum member.
(CODEVIEW_DEBUG): Define.
* flags.h (codeview_debuginfo_p): Proottype.
* opts.cc (debug_type_names): Add codeview.
(debug_type_masks): Add CODEVIEW_DEBUG.
(df_set_names): Add codeview.
(codeview_debuginfo_p): New function.
(dwarf_based_debuginfo_p): Add CODEVIEW clause.
(set_debug_level): Handle CODEVIEW_DEBUG.
* toplev.cc (process_options): Handle codeview.

gcc/testsuite
* gcc.dg/debug/codeview/codeview-1.c: New test.
* gcc.dg/debug/codeview/codeview.exp: New testsuite driver.

tree-optimization/114760 - check variants of >> and << in loop-niter

When recognizing bit counting idiom, include pattern "x * 2"
for "x << 1", and "x / 2" for "x >> 1" (given x is unsigned).

gcc/ChangeLog:
PR tree-optimization/114760
* tree-ssa-loop-niter.cc (is_lshift_by_1): New function
to check if STMT is equivalent to x << 1.
(is_rshift_by_1): New function to check if STMT is
equivalent to x >> 1.
(number_of_iterations_cltz): Enhance the identification
of logical shift by one.
(number_of_iterations_cltz_complement): Enhance the
identification of logical shift by one.

gcc/testsuite/ChangeLog:
PR tree-optimization/114760
* gcc.dg/tree-ssa/pr114760-1.c: New test.
* gcc.dg/tree-ssa/pr114760-2.c: New test.

[prange] Default unimplemented prange operators to false.

The canonical way to indicate that a range operator is unsupported is
to return false, which has the sematic meaning of VARYING. This patch
cleans up a few default virtuals that were trying harder to set
VARYING manually.

gcc/ChangeLog:

* range-op-ptr.cc (range_operator::fold_range): Return false.

[prange] Do not trap by default on range dispatch mismatches.

The trap in the range-op dispatch code is really an internal debugging
aid, and only a temporary one for a few weeks while the dust settles.
This patch turns it off by default, allowing problematic passes to
turn it on for analysis.

gcc/ChangeLog:

* range-op.cc (TRAP_ON_UNHANDLED_POINTER_OPERATORS): New
(range_op_handler::fold_range): Use it.
(range_op_handler::op1_range): Same.
(range_op_handler::op2_range): Same.
(range_op_handler::lhs_op1_relation): Same.
(range_op_handler::lhs_op2_relation): Same.
(range_op_handler::op1_op2_relation): Same.

c++: Implement __is_nothrow_invocable built-in trait

This patch implements built-in trait for std::is_nothrow_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_nothrow_invocable.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_NOTHROW_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_nothrow_invocable.
* g++.dg/ext/is_nothrow_invocable.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __is_invocable built-in trait

This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __array_rank built-in trait

This patch implements built-in trait for std::rank.

gcc/cp/ChangeLog:

* cp-trait.def: Define __array_rank.
* constraint.cc (diagnose_trait_expr): Handle CPTK_RANK.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __array_rank.
* g++.dg/ext/rank.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __decay built-in trait

This patch implements built-in trait for std::decay.

gcc/cp/ChangeLog:

* cp-trait.def: Define __decay.
* semantics.cc (finish_trait_type): Handle CPTK_DECAY.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __decay.
* g++.dg/ext/decay.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __add_rvalue_reference built-in trait

This patch implements built-in trait for std::add_rvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_rvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_RVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_rvalue_reference.
* g++.dg/ext/add_rvalue_reference.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __add_lvalue_reference built-in trait

This patch implements built-in trait for std::add_lvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_lvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_LVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_lvalue_reference.
* g++.dg/ext/add_lvalue_reference.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __remove_all_extents built-in trait

This patch implements built-in trait for std::remove_all_extents.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_all_extents.
* semantics.cc (finish_trait_type): Handle
CPTK_REMOVE_ALL_EXTENTS.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__remove_all_extents.
* g++.dg/ext/remove_all_extents.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __remove_extent built-in trait

This patch implements built-in trait for std::remove_extent.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_extent.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_EXTENT.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_extent.
* g++.dg/ext/remove_extent.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __add_pointer built-in trait

This patch implements built-in trait for std::add_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_ADD_POINTER.
(object_type_p): New function.
(referenceable_type_p): Likewise.
(trait_expr_value): Use object_type_p.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __add_pointer.
* g++.dg/ext/add_pointer.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Implement __is_unbounded_array built-in trait

This patch implements built-in trait for std::is_unbounded_array.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_unbounded_array.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_UNBOUNDED_ARRAY.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_unbounded_array.
* g++.dg/ext/is_unbounded_array.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jason Merrill <jason@redhat.com>

[RISC-V] Use shNadd for constant synthesis

So here's the next idiom to improve constant synthesis.

The basic idea here is to try and use shNadd to generate the constant when profitable.

Let's take 0x300000801.  Right now that generates:

        li      a0,3145728
        addi    a0,a0,1
        slli    a0,a0,12
        addi    a0,a0,-2047

But we can do better.  The constant is evenly divisible by 9 resulting in
0x55555639 which doesn't look terribly interesting.  But that constant can be
generated with two instructions, then we can use a sh3add to multiply it by 9.
So the updated sequence looks like:

        li      a0,1431654400
        addi    a0,a0,1593
        sh3add  a0,a0,a0

This doesn't trigger a whole lot, but I haven't really set up a test to explore
the most likely space where this might be useful.  The tests were found
exploring a different class of constant synthesis problems.

If you were to dive into the before/after you'd see that the shNadd interacts
quite nicely with the recent bseti work.   The joys of recursion.

Probably the most controversial thing in here is using the "FMA" opcode to
stand in for when we want to use shNadd.  Essentially when we synthesize a
constant we generate a series of RTL opcodes and constants for emission by
another routine.   We don't really have a way to say we want a shift-add.  But
you can think of shift-add as a limited form of multiply-accumulate.  It's a
bit of a stretch, but not crazy bad IMHO.

Other approaches would be to store our own enum rather than an RTL opcode.  Or
store an actual generator function rather than any kind of opcode.

It wouldn't take much pushback over (ab)using FMA in this manner to get me to
use our own enums rather than RTL opcodes for this stuff.

gcc/

* config/riscv/riscv.cc (riscv_build_integer_1): Recognize cases where
we can use shNadd to improve constant synthesis.
(riscv_move_integer): Handle code generation for shNadd.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: Also count shNadd instructions.
* gcc.target/riscv/synthesis-3.c: New test.

i386: Improve V[48]QI shifts on AVX512/SSE4.1

The following one line patch improves the code generated for V8QI and V4QI
shifts when AV512BW and AVX512VL functionality is available.

For the testcase (from gcc.target/i386/vect-shiftv8qi.c):

typedef signed char v8qi __attribute__ ((__vector_size__ (8)));
v8qi foo (v8qi x) { return x >> 5; }

GCC with -O2 -march=cascadelake currently generates:

foo: movl    $67372036, %eax
        vpsraw  $5, %xmm0, %xmm2
        vpbroadcastd    %eax, %xmm1
        movl    $117901063, %eax
        vpbroadcastd    %eax, %xmm3
        vmovdqa %xmm1, %xmm0
        vmovdqa %xmm3, -24(%rsp)
        vpternlogd      $120, -24(%rsp), %xmm2, %xmm0
        vpsubb  %xmm1, %xmm0, %xmm0
        ret

with this patch we now generate the much improved:

foo: vpmovsxbw       %xmm0, %xmm0
        vpsraw  $5, %xmm0, %xmm0
        vpmovwb %xmm0, %xmm0
        ret

This patch also fixes the FAILs of gcc.target/i386/vect-shiftv[48]qi.c
when run with the additional -march=cascadelake flag, by splitting these
tests into two; one form testing code generation with -msse2 (and
-mno-avx512vl) as originally intended, and the other testing AVX512
code generation with an explicit -march=cascadelake.

2024-05-10  Roger Sayle  <roger@nextmovesoftware.com>
    Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
Don't attempt ix86_expand_vec_shift_qihi_constant on SSE4.1.

gcc/testsuite/ChangeLog
* gcc.target/i386/vect-shiftv4qi.c: Specify -mno-avx512vl.
* gcc.target/i386/vect-shiftv8qi.c: Likewise.
* gcc.target/i386/vect-shiftv4qi-2.c: New test case.
* gcc.target/i386/vect-shiftv8qi-2.c: Likewise.

pru: Fix register class checks in predicates

The register class checks in the multiply-source predicates was
incorrectly using the register number instead of the register
class for comparison.

gcc/ChangeLog:

* config/pru/predicates.md (pru_mulsrc0_operand): Use register
class instead of register number for the check.
(pru_mulsrc1_operand): Ditto.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

[PR114942][LRA]: Don't reuse input reload reg of inout early clobber operand

  The insn in question has the same reg in inout operand and input
operand.  The inout operand is early clobber.  LRA reused input reload
reg of the inout operand for the input operand which is wrong.  It
were a good decision if the inout operand was not early clobber one.
The patch rejects the reuse for the PR test case.

gcc/ChangeLog:

PR target/114942
* lra-constraints.cc (struct input_reload): Add new member early_clobber_p.
(get_reload_reg): Add new arg early_clobber_p, don't reuse input
reload with true early_clobber_p member value, use the arg for new
element of curr_insn_input_reloads.
(match_reload): Assign false to early_clobber_p member.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform):
Adjust get_reload_reg calls.

gcc/testsuite/ChangeLog:

PR target/114942
* gcc.target/i386/pr114942.c: New.

[prange] Fix thinko in prange::update_bitmask() [PR115026]

gcc/ChangeLog:

PR tree-optimization/115026
* value-range.cc (prange::update_bitmask): Use operand bitmask.

tree-optimization/114998 - use-after-free with loop distribution

When loop distribution releases a PHI node of the original IL it
can end up clobbering memory that's re-used when it upon releasing
its RDG resets all stmt UIDs back to -1, even those that got released.

The fix is to avoid resetting UIDs based on stmts in the RDG but
instead reset only those still present in the loop.

PR tree-optimization/114998
* tree-loop-distribution.cc (free_rdg): Take loop argument.
Reset UIDs of stmts still in the IL rather than all stmts
referenced from the RDG.
(loop_distribution::build_rdg): Pass loop to free_rdg.
(loop_distribution::distribute_loop): Likewise.
(loop_distribution::transform_reduction_loop): Likewise.

* gcc.dg/torture/pr114998.c: New testcase.

Allow patterns in SLP reductions

The following removes the over-broad rejection of patterns for SLP
reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
during pattern detection. That's also insufficient in case the
pattern only appears on the reduction path. Instead this implements
the proper correctness check in vectorizable_reduction and guides
SLP discovery to heuristically avoid forming later invalid groups.

I also couldn't find any testcase that FAILs when allowing the SLP
reductions to form so I've added one.

I came across this for single-lane SLP reductions with the all-SLP
work where we rely on patterns to properly vectorize COND_EXPR
reductions.

* tree-vect-patterns.cc (vect_pattern_recog_1): Do not
remove reductions involving patterns.
* tree-vect-loop.cc (vectorizable_reduction): Reject SLP
reduction groups with multiple lane-reducing reductions.
* tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
SLP reduction groups avoid including lane-reducing ones.

* gcc.dg/vect/vect-reduc-sad-9.c: New testcase.

AVR: target/114981 - Tweak __builtin_powif / __powisf2

Implement __powisf2 in assembly.

PR target/114981
libgcc/
* config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _powisf2.
(LIB1ASMFUNCS) [!avrtiny]: Add _powif.
* config/avr/lib1funcs.S (mov4): New .macro.
(L_powif, __powisf2) [!avrtiny]: New module and function.

gcc/testsuite/
* gcc.target/avr/pr114981-powif.c: New test.

bpf: fix printing of memory operands in pseudoc asm dialect

The BPF backend was emitting memory operands in pseudo-C syntax
without surrounding parentheses.  These were being provided in the
corresponding instruction templates.

This was causing GCC emitting invalid instructions when finding inline
assembly with memory operands like:

asm volatile (
"r1 = *(u64 *)%[ctx_a];"
"if r1 != 42 goto 1f;"
"r1 = *(u64 *)%[ctx_b];"
"if r1 != 42 goto 1f;"
"r1 = *(u64 *)%[ctx_c];"
"if r1 != 7 goto 1f;"
"r1 /= 0;"
"1:"
:
: [ctx_a]"m"(ctx.a),
  [ctx_b]"m"(ctx.b),
  [ctx_c]"m"(ctx.c)
: "r1"
);

This patch changes the backend to include the surrounding parentheses
in the printed representation of the memory operands (much like
surrounding brackets are included in normal asm syntax) and adapts the
impacted instruction templates accordingly.

Tested in target bpf-unknown-none, host x86_64-linux-gnu.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_print_operand_address): Include
surrounding parenthesis around mem operands in pseudoc asm
dialect.
* config/bpf/bpf.md (*mov<MM:mode>): Adapt accordingly.
(zero_extendhidi2): Likewise.
(zero_extendqidi2): Likewise.
(*extendsidi2): Likewise.
(*extendsidi2): Likewise.
(extendhidi2): Likewise.
(extendqidi2): Likewise.
(extendhisi2): Likewise.
* config/bpf/atomic.md (atomic_add<AMO:mode>): Likewise.
(atomic_and<AMO:mode>): Likewise.
(atomic_or<AMO:mode>): Likewise.
(atomic_xor<AMO:mode>): Likewise.
(atomic_fetch_add<AMO:mode>): Likewise.
(atomic_fetch_and<AMO:mode>): Likewise.
(atomic_fetch_or<AMO:mode>): Likewise.
(atomic_fetch_xor<AMO:mode>): Likewise.

c++, mingw: Fix up types of dtor hooks to __cxa_{,thread_}atexit/__cxa_throw on mingw ia32 [PR114968]

__cxa_atexit/__cxa_thread_atexit/__cxa_throw functions accept function
pointers to usually directly destructors rather than wrappers around
them.
Now, mingw ia32 uses implicitly __attribute__((thiscall)) calling
conventions for METHOD_TYPE (where the this pointer is passed in %ecx
register, the rest on the stack), so these functions use:
in config/os/mingw32/os_defines.h:
#if defined (__i386__)
#define _GLIBCXX_CDTOR_CALLABI __thiscall
#endif
in libsupc++/cxxabi.h
__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) _GLIBCXX_NOTHROW;
__cxa_thread_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void *) _GLIBCXX_NOTHROW;
__cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
__attribute__((__noreturn__));

Now, mingw for some weird reason uses
#define TARGET_CXX_USE_ATEXIT_FOR_CXA_ATEXIT hook_bool_void_true
so it never actually uses __cxa_atexit, but does use __cxa_thread_atexit
and __cxa_throw.  Recent changes for modules result in more detailed
__cxa_*atexit/__cxa_throw prototypes precreated by the compiler, and if
that happens and one also includes <cxxabi.h>, the compiler complains about
mismatches in the prototypes.

One thing is the missing thiscall attribute on the FUNCTION_TYPE, the
other problem is that all of atexit/__cxa_atexit/__cxa_thread_atexit
get function pointer types created by a single function,
get_atexit_fn_ptr_type (), which creates it depending on if atexit
or __cxa_atexit will be used as either void(*)(void) or void(*)(void *),
but when using atexit and __cxa_thread_atexit it uses the wrong function
type for __cxa_thread_atexit.

The following patch adds a target hook to add the thiscall attribute to the
function pointers, and splits the get_atexit_fn_ptr_type () function into
get_atexit_fn_ptr_type () and get_cxa_atexit_fn_ptr_type (), the former always
creates shared void(*)(void) type, the latter creates either
void(*)(void*) (on most targets) or void(__attribute__((thiscall))*)(void*)
(on mingw ia32).  So that we don't waiste another GTY global tree for it,
because cleanup_type used for the same purpose for __cxa_throw should be
the same, the code changes it to use that type too.

In register_dtor_fn then based on the decision whether to use atexit,
__cxa_atexit or __cxa_thread_atexit it picks the right function pointer
type, and also if it decides to emit a __tcf_* wrapper for the cleanup,
uses that type for that wrapper so that it agrees on calling convention.

2024-05-10  Jakub Jelinek  <jakub@redhat.com>

PR target/114968
gcc/
* target.def (use_atexit_for_cxa_atexit): Remove spurious space
from comment.
(adjust_cdtor_callabi_fntype): New cxx target hook.
* targhooks.h (default_cxx_adjust_cdtor_callabi_fntype): Declare.
* targhooks.cc (default_cxx_adjust_cdtor_callabi_fntype): New
function.
* doc/tm.texi.in (TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Add.
* doc/tm.texi: Regenerate.
* config/i386/i386.cc (ix86_cxx_adjust_cdtor_callabi_fntype): New
function.
(TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Redefine.
gcc/cp/
* cp-tree.h (atexit_fn_ptr_type_node, cleanup_type): Adjust macro
comments.
(get_cxa_atexit_fn_ptr_type): Declare.
* decl.cc (get_atexit_fn_ptr_type): Adjust function comment, only
build type for atexit argument.
(get_cxa_atexit_fn_ptr_type): New function.
(get_atexit_node): Call get_cxa_atexit_fn_ptr_type rather than
get_atexit_fn_ptr_type when using __cxa_atexit.
(get_thread_atexit_node): Call get_cxa_atexit_fn_ptr_type
rather than get_atexit_fn_ptr_type.
(start_cleanup_fn): Add ob_parm argument, call
get_cxa_atexit_fn_ptr_type or get_atexit_fn_ptr_type depending
on it and create PARM_DECL also based on that argument.
(register_dtor_fn): Adjust start_cleanup_fn caller, use
get_cxa_atexit_fn_ptr_type rather than get_atexit_fn_ptr_type
for use_dtor casts.
* except.cc (build_throw): Use get_cxa_atexit_fn_ptr_type ().

[prange] Do not assume all pointers are the same size [PR115009]

In a world with same sized pointers we can always reuse the storage
slots, but since this is not always the case, we need to be more
careful. However, we can always store an undefined, because that
requires no extra storage.

gcc/ChangeLog:

PR tree-optimization/115009
* value-range-storage.cc (prange_storage::alloc): Do not assume
all pointers are the same size.
(prange_storage::prange_storage): Same.
(prange_storage::fits_p): Same.

RISC-V: Fix typos in code or comment [NFC]

Just found some typo when fixing bugs and then use aspell to find few
more typos, this patch didn't do anything other than fix typo.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix typos in comments.
(get_all_predecessors): Ditto.
(pre_vsetvl::m_unknow_info): Rename to...
(pre_vsetvl::m_unknown_info): this.
(pre_vsetvl::compute_vsetvl_def_data): Rename m_unknow_info to
m_unknown_info.
(pre_vsetvl::cleaup): Rename to...
(pre_vsetvl::cleanup): this.
(pre_vsetvl::compute_vsetvl_def_data): Fix typos.
(pass_vsetvl::lazy_vsetvl): Update function name and fix typos.
* config/riscv/riscv.cc: Fix typos in comments.
(struct machine_function): Fix typo in comments.
(riscv_valid_lo_sum_p): Ditto.
(riscv_force_address): Ditto.
(riscv_immediate_operand_p): Ditto.
(riscv_in_small_data_p): Ditto.
(riscv_first_stack_step): Ditto.
(riscv_expand_prologue): Ditto.
(riscv_convert_vector_chunks): Ditto.
(riscv_override_options_internal): Ditto.
(get_common_costs): Ditto.

driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

In GCC 14 we started to emit URLs for "command-line option <option> is
valid for <language> but not <another language>" and "-Werror= argument
'-Werror=<option>' is not valid for <language>" warnings. So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.

[committed] [RISC-V] Provide splitting guidance to combine to faciliate shNadd.uw generation

This fixes a minor code quality issue I found while comparing GCC and LLVM.
Essentially we want to do a bit of re-association to generate shNadd.uw
instructions.

Combine does the right thing and finds all the necessary instructions,
reassociates the operands, combines constants, etc. Where is fails is finding
a good split point. The backend can trivially provide guidance on how to split
via a define_split pattern.

This has survived both Ventana's internal CI system (rv64gcb) as well as my own
(rv64gc, rv32gcv).

I'll wait for the external CI system to give the all-clear before pushing.

gcc/
* config/riscv/bitmanip.md: Add splitter for shadd feeding another
add instruction.

gcc/testsuite/

* gcc.target/riscv/zba-shadduw.c: New test.

Revert: "Enable prange support." [PR114985]

This reverts commit 36e877996936abd8bd08f8b1d983c8d1023a5842 until the IPA
pass is fixed with regards to POINTER = POINTER <RELOP> POINTER.

Constant fold {-1,-1} << 1 in simplify-rtx.cc

This patch addresses a missed optimization opportunity in the RTL
optimization passes. The function simplify_const_binary_operation
will constant fold binary operators with two CONST_INT operands,
and those with two CONST_VECTOR operands, but is missing compile-time
evaluation of binary operators with a CONST_VECTOR and a CONST_INT,
such as vector shifts and rotates.

The first version of this patch didn't contain a switch statement to
explicitly check for valid binary opcodes, which bootstrapped and
regression tested fine, but my paranoia has got the better of me,
so this version now checks that VEC_SELECT or some funky (future)
rtx_code doesn't cause problems.

2024-05-09 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* simplify-rtx.cc (simplify_const_binary_operation): Constant
fold binary operations where the LHS is CONST_VECTOR and the
RHS is CONST_INT (or CONST_DOUBLE) such as vector shifts.

c++: failure to suppress -Wsizeof-array-div in template [PR114983]

-Wsizeof-array-div offers a way to suppress the warning by wrapping
the second operand of the division in parens:

  sizeof (samplesBuffer) / (sizeof(unsigned char))

but this doesn't work in a template, because we fail to propagate
the suppression bits.  Do it, then.

The finish_parenthesized_expr hunk is not needed because suppress_warning
isn't very fine-grained.  But I think it makes sense to be explicit and
not rely on OPT_Wparentheses also suppressing OPT_Wsizeof_array_div.

PR c++/114983

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case SIZEOF_EXPR>: Use copy_warning.
* semantics.cc (finish_parenthesized_expr): Also suppress
-Wsizeof-array-div.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wsizeof-array-div3.C: New test.

testsuite: Fix up pr84508* tests [PR84508]

The tests FAIL on x86_64-linux with
/usr/bin/ld: cannot find -lubsan
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: gcc.target/i386/pr84508-1.c (test for excess errors)
Excess errors:
/usr/bin/ld: cannot find -lubsan

The problem is that only *.dg/ubsan/ubsan.exp calls ubsan_init
which adds the needed search paths to libubsan library.
So, link/run tests for -fsanitize=undefined need to go into
gcc.dg/ubsan/ or g++.dg/ubsan/, even when they are target specific.

2024-05-09  Jakub Jelinek  <jakub@redhat.com>

PR target/84508
* gcc.target/i386/pr84508-1.c: Move to ...
* gcc.dg/ubsan/pr84508-1.c: ... here.  Restrict to i?86/x86_64
non-ia32 targets.
* gcc.target/i386/pr84508-2.c: Move to ...
* gcc.dg/ubsan/pr84508-2.c: ... here.  Restrict to i?86/x86_64
non-ia32 targets.

PR modula2/115003 exporting a symbol to outer scope with a name clash causes ICE

An ICE will occur if an unknown symbol is exported and causes a name
clash. The error mechanism attempts to find the scope of an unknown
symbol. This patch adds a missing case clause to GetScope and returns
NulSym if the scope is an unknown symbol.

gcc/m2/ChangeLog:

PR modula2/115003
* gm2-compiler/SymbolTable.mod (GetScope): Add UndefinedSym
case clause and return NulSym.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: lambda capturing structured bindings [PR85889]

<https://wg21.link/p1381r1> clarifies that it's OK to capture structured
bindings.

[expr.prim.lambda.capture]/4 says "The identifier in a simple-capture shall
denote a local entity" and [basic.pre]/3: "An entity is a [...] structured
binding".

It doesn't appear that this was made a DR, so, strictly speaking, we
should have a -Wc++20-extensions warning, like clang++.

PR c++/85889

gcc/cp/ChangeLog:

* lambda.cc (add_capture): Add a pedwarn for capturing structured
bindings.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/decomp3.C: Use -Wno-c++20-extensions.
* g++.dg/cpp1z/decomp60.C: New test.

Add myself to DCO

ChangeLog:

* MAINTAINERS: Add myself to DCO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

sra: Do not leave work for DSE (that it can sometimes not perform)

When looking again at the g++.dg/tree-ssa/pr109849.C testcase we
discovered that it generates terrible store-to-load forwarding stalls
because SRA was leaving behind aggregate loads but all the stores were
by scalar parts and DSE failed to remove the useless load.  SRA has
all the knowledge to remove the statement even now, so this small
patch makes it do so.

With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9
times faster (on an AMD EPYC 75F3 machine).

gcc/ChangeLog:

2024-04-18  Martin Jambor  <mjambor@suse.cz>

* tree-sra.cc (sra_modify_assign): Remove the original statement
also when dealing with a store to a fully covered aggregate from a
non-candidate.

gcc/testsuite/ChangeLog:

2024-04-23  Martin Jambor  <mjambor@suse.cz>

* g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store
to cur disappears.
* gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE,
check that the unwanted stores were removed at early SRA time.

Manually update entries for the Revert Revert commits.

contrib: Add 109f1b28fc94c93096506e3df0c25e331cef19d0 to ignored commits

2024-05-09 Jakub Jelinek <jakub@redhat.com>

* gcc-changelog/git_update_version.py: Replace
9dbff9c05520a74e6cd337578f27b56c941f64f3 with
39f81924d88e3cc197fc3df74204c9b5e01e12f7 and
109f1b28fc94c93096506e3df0c25e331cef19d0 in IGNORED_COMMITS.

Daily bump.

RISC-V: Make full-vec-move1.c test robust for optimization

During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body.  Because somehow the value of V type is compiler
time constant,  and then the second loop will be considered as
assert (true).

Thus,  the ccp4 pass will eliminate these stmt and just return 0.

typedef int16_t V __attribute__((vector_size (128)));

int main ()
{
  V v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
    (v)[i] = i;

  V res = v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
    assert (res[i] == i); // will be optimized to assert (true)
}

This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.

Signed-off-by: Pan Li <pan2.li@intel.com>

contrib: Add 9dbff9c05520a74e6cd337578f27b56c941f64f3 to ignored commits

2024-05-09 Jakub Jelinek <jakub@redhat.com>

* gcc-changelog/git_update_version.py: Add
9dbff9c05520a74e6cd337578f27b56c941f64f3 to IGNORED_COMMITS.

testsuite: Fix up vector-subaccess-1.C test for ia32 [PR89224]

The test FAILs on i686-linux due to
.../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
excess warnings.

This fixes it by adding -Wno-psabi, like commonly done in other tests.

2024-05-09 Jakub Jelinek <jakub@redhat.com>

PR c++/89224
* g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional
options.

MIPS: Support constraint 'w' for MSA instruction

Support syntax like:
asm volatile ("fmadd.d %w0, %w1, %w2" : "+w"(a): "w"(b), "w"(c));

gcc
* config/mips/constraints.md: Add new constraint 'w'.

gcc/testsuite
* gcc.target/mips/msa-inline-asm.c: New test.

RISC-V: Add tests for cpymemsi expansion

cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.

Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

i386: Fix some intrinsics without alignment requirements.

gcc/ChangeLog:

PR target/84508
* config/i386/emmintrin.h
(_mm_load_sd): Remove alignment requirement.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
(_mm_storel_pd): Add alignment requirement.
* config/i386/xmmintrin.h
(_mm_loadh_pi): Remove alignment requirement.
(_mm_loadl_pi): Ditto.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.

gcc/testsuite/ChangeLog:

PR target/84508
* gcc.target/i386/pr84508-1.c: New test.
* gcc.target/i386/pr84508-2.c: Ditto.

[ranger] Force buffer alignment in Value_Range [PR114912]

gcc/ChangeLog:

PR tree-optimization/114912
* value-range.h (class Value_Range): Use a union.

[prange] Reword dispatch error message

After reading the ICE for the PR, it's obvious the error message is
rather cryptic. This makes it less so.

gcc/ChangeLog:

* range-op.cc (range_op_handler::discriminator_fail): Reword error
message.

i386: fix ix86_hardreg_mov_ok with lra_in_progress

Originally eliminate_regs_in_insnit will transform
(parallel [
  (set (reg:QI 130)
    (plus:QI (subreg:QI (reg:DI 19 frame) 0)
      (const_int 96)))
  (clobber (reg:CC 17 flag))]) {*addqi_1}
to
(set (reg:QI 130)
(subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal}
when verify_changes.

But with No Flags add, it transforms
(set (reg:QI 5 di)
  (plus:QI (subreg:QI (reg:DI 19 frame) 0)
   (const_int 96))) {*addqi_1_nf}
to
(set (reg:QI 5 di)
(subreg:QI (reg:DI 19 frame) 0)) {*addqi_1_nf}.
there is no extra clobbers at the end, and
its dest reg just is a hardreg. For ix86_hardreg_mov_ok,
it returns false. So it fails to update insn and causes
the ICE when transform to movqi_internal.

But actually it is ok and safe for ix86_hardreg_mov_ok
when lra_in_progress.

And tested the spec2017, the performance was not affected.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_hardreg_mov_ok): Relax
hard reg mov restriction when lra in progress.

[PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

1 This patch implements the Nan-box of bf16.

2 Please refer to the Nan-box implementation of hf16 in:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=057dc349021660c40699fb5c98fd9cac8e168653>

3 The discussion about Nan-box can be found on the website:
<https://www.mail-archive.com/search?q=Nan-box+the+result+of+movhf+on+soft-fp16&l=gcc-patches%40gcc.gnu.org>

4 Below test are passed for this patch
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.

[RISC-V][V2] Fix incorrect if-then-else nesting of Zbs usage in constant synthesis

Reposting without the patch that ignores whitespace.  The CI system doesn't
like including both patches, that'll generate a failure to apply and none of
the tests actually get run.

So I managed to goof the if-then-else level of the bseti bits last week.  They
were supposed to be a last ditch effort to improve the result, but ended up
inside a conditional where they don't really belong.  I almost always use Zba,
Zbb and Zbs together, so it slipped by.

So it's NFC if you always test with Zbb and Zbs enabled together.  But if you
enabled Zbs without Zbb you'd see a failure to use bseti.

gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Fix incorrect
if-then-else nesting of Zbs code.

AVR: target/114981 - Support __builtin_powi[l] / __powidf2.

This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.

PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.

gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.

[PR114810][LRA]: Recognize alternatives with lack of available registers for insn and demote them.

  PR114810 was fixed in machine-dependent way.  This patch is a fix of
the PR on LRA side.  LRA chose alternative with constraints `&r,r,ro`
on i686 when all operands of DImode and there are only 6 available
general regs.  The patch recognizes such case and significantly
increase the alternative cost.  It does not reject alternative
completely.  So the fix is safe but it might not work for all
potentially possible cases of registers lack as register classes can
have any relations including subsets and intersections.

gcc/ChangeLog:

PR target/114810
* lra-constraints.cc (process_alt_operands): Calculate union reg
class for the alternative, peak matched regs and required reload
regs.  Recognize alternatives with lack of available registers and
make them costly.  Add debug print about this case.

c++: #pragma doesn't disable -Wunused-label [PR113582]

The PR complains that

  void do_something(){
    #pragma GCC diagnostic push
    #pragma GCC diagnostic ignored "-Wunused-label"
    start:;
    #pragma GCC diagnostic pop
  } #1

doesn't work.  That's because we warn_for_unused_label only while we're
in finish_function, meaning we're at #1 where we're outside the #pragma
region.  We can use suppress_warning + warning_suppressed_p to fix this.

Note that I'm not using TREE_USED.  Propagating it in tsubst_stmt/LABEL_EXPR
from decl to label would mean that we don't warn in do_something2, but
I think we want the warning there: we're in a template and the goto is
a discarded statement.

PR c++/113582

gcc/c-family/ChangeLog:

* c-warn.cc (warn_for_unused_label): Don't warn if -Wunused-label has
been suppressed for the label.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_label_for_labeled_statement): suppress_warning
if it's not enabled at input_location.
* pt.cc (tsubst_stmt): Call copy_warning.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wunused-label-4.C: New test.

match: `a CMP nonnegative ? a : ABS<a>` simplified to just `ABS<a>` [PR112392]

We can optimize `a == nonnegative ? a : ABS<a>`, `a > nonnegative ? a : ABS<a>`
and `a >= nonnegative ? a : ABS<a>` into `ABS<a>`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note I have a secondary pattern for the equal case as either a or nonnegative
could be used.

PR tree-optimization/112392

gcc/ChangeLog:

* match.pd (`x CMP nonnegative ? x : ABS<x>`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS<x>`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-41.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in begining of the block after the labels.

2024-05-08 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location):Sink statements at
the begining of the basic block after labels.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.

RISC-V: Cover sign-extensions in lshr<GPR:mode>3_zero_extend_4

The lshr<GPR:mode>3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.

The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): Rename to...
(*<any_extract:optab><GPR:mode>3):...this and add support for
sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Add zero_extract support for rv64gc

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions. Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR target/111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2

The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.

The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.

Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Add test for sraiw-31 special case

We already optimize a sign-extension of a right-shift by 31 in
<optab>si3_extend. Let's add a test for that (similar to
zero-extend-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Fix SLP reduction initial value for pointer reductions

For pointer reductions we need to convert the initial value to
the vector component integer type.

* tree-vect-loop.cc (get_initial_defs_for_reduction): Convert
initial value to the vector component type.

Fix non-grouped SLP load/store accounting in alignment peeling

When we have a non-grouped access we bogously multiply by zero.
This shows most with single-lane SLP but also happens with
the multi-lane splat case.

* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
Properly guard DR_GROUP_SIZE access with STMT_VINFO_GROUPED_ACCESS.

aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

This fixes a typo in combine_reg_notes in the load/store pair fusion
pass. As it stands, the calls to filter_notes store any
REG_FRAME_RELATED_EXPR to fr_expr with the following association:

- i2 -> fr_expr[0]
- i1 -> fr_expr[1]

but then the checks inside the following if statement expect the
opposite (more natural) association, i.e.:

- i2 -> fr_expr[1]
- i1 -> fr_expr[0]

this patch fixes the oversight by swapping the fr_expr indices in the
calls to filter_notes.

In hindsight it would probably have been less confusing / error-prone to
have combine_reg_notes take an array of two insns, then we wouldn't have
to mix 1-based and 0-based indexing as well as remembering to call
filter_notes in reverse program order. This however is a minimal fix
for backporting purposes.

gcc/ChangeLog:

PR target/114936
* config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
FR_EXPR[N-1], thus matching the correspondence expected by the
copy_rtx calls.

tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

This fixes a couple of tests (gcc.dg/vect/pr109011-*.c) on s390 where
loops are unrolled although -fno-unroll-loops is specified.

gcc/ChangeLog:

* tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
-fno-unroll-loops.

AVR: target/114975 - Add combine-pattern for __parityqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit parity detection.

gcc/testsuite/
* gcc.target/avr/pr114975-parity.c: New test.

AVR: target/114975 - Add combine-pattern for __popcountqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit popcount detection.

gcc/testsuite/
* gcc.target/avr/pr114975-popcount.c: New test.

Fix and speedup IDF pruning by dominator

When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator. In fact if it ever were the code
fails to copy IDF to PRUNED_IDF, leading to wrong code.

The following fixes that by avoiding the copy and pruning from the
IDF in-place as well as using the more approprate check against
the single successor of the ENTRY_BLOCK.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK. Do not copy IDF but prune it directly.

reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]

The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
          if (entry_test_needed)
            {
              tem = build_range_check (loc, optype, unshare_expr (exp),
                                       false, lowi, high);
              if (tem == NULL_TREE || is_gimple_val (tem))
                continue;
            }
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.

The following testcase is miscompiled because the two abnormal cases
are triggered.  The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.

The following patch ensures that.

2024-05-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.

* gcc.c-torture/execute/pr114965.c: New test.

Minor tweaks to code computing modular multiplicative inverse

This removes the last parameter of choose_multiplier, which is unused, adds
another assertion and more details to the description and various comments.
Likewise to the closely related invert_mod2n, except for the last parameter.

[changelog]
* expmed.h (choose_multiplier): Tweak description and remove last
parameter.
* expmed.cc (choose_multiplier): Likewise. Add assertion for the
third parameter and adds details to various comments.
(invert_mod2n): Tweak description and add assertion for the first
parameter.
(expand_divmod): Adjust calls to choose_multiplier.
* tree-vect-generic.cc (expand_vector_divmod): Likewise.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Likewise.

x86: Fix cmov cost model issue [PR109549]

(if_then_else:SI (eq (reg:CCZ 17 flags)
        (const_int 0 [0]))
    (reg/v:SI 101 [ e ])
    (reg:SI 102))
The cost is 8 for the rtx, the cost for
(eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4,
but this is just an operator do not need to compute it's cost in cmov.

gcc/ChangeLog:

PR target/109549
* config/i386/i386.cc (ix86_rtx_costs): The XEXP (x, 0) for cmov
is an operator do not need to compute cost.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cmov6.c: Fixed.

Enable prange support.

This throws the switch on prange.  After this patch, it is no longer
valid to store a pointer in an irange (or vice versa).  Instead, they
must go in prange, which is faster and more memory efficient.

I will push this now, so I have time to do any follow-up bugfixing
before going on paternity leave.

There are various cleanups we plan on doing after this patch (faster
intersect/union, remove range-op-mixed.h, remove value_range in favor
of int_range_max, reclaim the name for the Value_Range temporary,
clean up range-ops, etc etc).  But we will hold off on those for now
to make it easier to revert this patch, if for some reason we need to
do so while I'm away.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_sparse_bitmap::sbr_sparse_bitmap):
Change irange to prange.
* gimple-range-fold.cc (fold_using_range::fold_stmt): Same.
(fold_using_range::range_of_address): Same.
* gimple-range-fold.h (range_of_address): Same.
* gimple-range-infer.cc (gimple_infer_range::add_nonzero): Same.
* gimple-range-op.cc (class cfn_strlen): Same.
* gimple-range-path.cc
(path_range_query::adjust_for_non_null_uses): Same.
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses): Same.
* tree-ssa-structalias.cc (find_what_p_points_to): Same.
* range-op-ptr.cc (range_op_table::initialize_pointer_ops): Remove
hybrid entries in table.
* range-op.cc (range_op_table::range_op_table): Add pointer
entries for bitwise and/or and min/max.
* value-range.cc (irange::verify_range): Add assert.
* value-range.h (irange::varying_compatible_p): Remove check for
error_mark_node.
(irange::supports_p): Remove pointer support.
* ipa-cp.h (ipa_supports_p): Add prange support.

Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle xpass from combine improvement""

This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.

c++/modules: Stream unmergeable temporaries by value again [PR114856]

In r14-9266-g2823b4d96d9ec4 I gave all temporary vars a DECL_CONTEXT,
including those at namespace or global scope, so that they could be
properly merged across importers. However, not all of these temporary
vars are actually supposed to be mergeable.

For instance, in the attached testcase we have an unnamed temporary var
used in the NSDMI of a class member, which cannot properly merged -- but
it also doesn't need to be, as it'll be thrown away when the class type
itself is merged anyway.

This patch reverts the change made above and instead makes a weaker
adjustment that only causes temporary vars with linkage have a
DECL_CONTEXT to merge from. This way these unnamed, "unmergeable"
temporaries are properly streamed by value again.

PR c++/114856

gcc/cp/ChangeLog:

* call.cc (make_temporary_var_for_ref_to_temp): Set context for
temporaries with linkage.
* init.cc (create_temporary_var): Revert to only set context
when in a function decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114856.h: New test.
* g++.dg/modules/pr114856_a.H: New test.
* g++.dg/modules/pr114856_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not have const on it.
This was due to a missing build_qualified_type for the inner type of the vector when building the array type.
We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.

Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb.

Built and tested for aarch64-linux-gnu.

PR c++/89224

gcc/c-family/ChangeLog:

* c-common.cc (convert_vector_to_array_for_subscript): Call build_qualified_type
for the inner type.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

DCE __cxa_atexit calls where the function is pure/const [PR19661]

In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Changes since v1:
  * v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
        Add cxa_atexit-5.C testcase for -fPIC case.
  * v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2).

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

This adds a few more of what is currently done in phiopt's value_replacement
to match. I noticed this when I was hooking up phiopt's value_replacement
code to use match and disabling the old code. But this can be done
independently from the hooking up phiopt's value_replacement as phiopt
is already hooked up for simplified versions already.

/* a != 0 ? a / b : 0 -> a / b iff b is nonzero. */
/* a != 0 ? a * b : 0 -> a * b */
/* a != 0 ? a & b : 0 -> a & b */

We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
uses that form.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/114894

gcc/ChangeLog:

* match.pd (`a != 0 ? a / b : 0`): New pattern.
(`a != 0 ? a * b : 0`): New pattern.
(`a != 0 ? a & b : 0`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-5.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

Per quick email exchange with Palmer. Given the triviality, I'm just pushing
it.

gcc/
* config/riscv/riscv.cc (generic_ooo_tune_info): Turn on
overlap_op_by_pieces.

[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have a knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.

This patch adds the that capability in our tuning structure. It's off for all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success. So technically it's NFC upstream, but puts in the infrastructure
multiple organizations likely need.

gcc/

* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.

gcc/testsuite/

* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.

c++: Implement C++26 P2893R3 - Variadic friends [PR114459]

The following patch imeplements the C++26 P2893R3 - Variadic friends
paper.  The paper allows for the friend type declarations to specify
more than one friend type specifier and allows to specify ... at
the end of each.  The patch doesn't introduce tentative parsing of
friend-type-declaration non-terminal, but rather just extends existing
parsing where it is a friend declaration which ends with ; after the
declaration specifiers to the cases where it ends with ...; or , or ...,
In that case it pedwarns for cxx_dialect < cxx26, handles the ... and
if there is , continues in a loop to parse the further friend type
specifiers.

2024-05-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/114459
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_variadic_friend=202403L for C++26.
gcc/cp/
* parser.cc (cp_parser_member_declaration): Implement C++26
P2893R3 - Variadic friends.  Parse friend type declarations
with ... or with more than one friend type specifier.
* friend.cc (make_friend_class): Allow TYPE_PACK_EXPANSION.
* pt.cc (instantiate_class_template): Handle PACK_EXPANSION_P
in friend classes.
gcc/testsuite/
* g++.dg/cpp26/feat-cxx26.C (__cpp_variadic_friend): Add test.
* g++.dg/cpp26/variadic-friend1.C: New test.

expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.

I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.

2024-05-07 Jakub Jelinek <jakub@redhat.com>

PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.

* gcc.dg/pr114907.c: New test.

tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.

This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.

Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.

2024-05-07 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.

* gcc.dg/asan/pr114956.c: New test.

c++: DECL_DECOMPOSITION_P cleanup

DECL_DECOMPOSITION_P already checks VAR_P but we repeat the check
in a lot of places.

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Don't check VAR_P before
DECL_DECOMPOSITION_P.
* init.cc (build_aggr_init): Likewise.
* parser.cc (cp_parser_range_for): Likewise.
(do_range_for_auto_deduction): Likewise.
(cp_convert_range_for): Likewise.
(cp_convert_omp_range_for): Likewise.
(cp_finish_omp_range_for): Likewise.
* pt.cc (extract_locals_r): Likewise.
(tsubst_omp_for_iterator): Likewise.
(tsubst_decomp_names): Likewise.
(tsubst_stmt): Likewise.
* typeck.cc (maybe_warn_about_returning_address_of_local): Likewise.