git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: testsuite: Add missing -mabi=lp64d.

This fixes more cases of missing -mabi=lp64d.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: Add
-mabi=lp64d.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Dito.

RISC-V: Set the natural size of constant vector mask modes to one RVV data vector.

If reinterpret vnx2bi as vnx16qi, vnx16qi must occupy no more of the underlying
registers than vnx2bi.

Consider this following case:
void test_vreinterpret_v_b64_i8m1 (uint8_t *in, int8_t *out)
{
  vbool64_t vmask = __riscv_vlm_v_b64 (in, 2);
  vint8m1_t vout = __riscv_vreinterpret_v_b64_i8m1 (vmask);
  __riscv_vse8_v_i8m1(out, vout, 16);
}

compiler parameters: -march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3
Compilation fails with:
test_vreinterpret_v_b64_i8m1during RTL pass: expand

test.c: In function 'test_vreinterpret_v_b64_i8m1':
test.c:11:22: internal compiler error: in gen_lowpart_general, at rtlhooks.cc:57
   11 |     vint8m1_t vout = __riscv_vreinterpret_v_b64_i8m1(src);
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0xf11876 gen_lowpart_general(machine_mode, rtx_def*)
        ../.././riscv-gcc/gcc/rtlhooks.cc:57
0x191435e gen_vreinterpretvnx16qi(rtx_def*, rtx_def*)
        ../.././riscv-gcc/gcc/config/riscv/vector.md:486
0xe08858 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
        ../.././riscv-gcc/gcc/optabs.cc:8213
0x1471209 riscv_vector::function_expander::generate_insn(insn_code)
        ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.cc:3813
0x147629c riscv_vector::function_expander::expand()
        ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.h:520
0x147629c riscv_vector::expand_builtin(unsigned int, tree_node*, rtx_def*)
        ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.cc:4103
0x9868f9 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
        ../.././riscv-gcc/gcc/builtins.cc:7342

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_regmode_natural_size): set the natural
size of vector mask mode to one rvv register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vreinterpet-fixed.c: New test.

RISC-V: Optimize codegen of VLA SLP

Add comments for Robin:
We want to create a pattern where value[ix] = floor (ix / NPATTERNS).
As NPATTERNS is always a power of two we can rewrite this as
= ix & -NPATTERNS.
`
Recently, I figure out a better approach in case of codegen for VLA stepped vector.

Here is the detail descriptions:

Case 1:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8] = b[i * 8 + 37] + 1;
      a[i * 8 + 1] = b[i * 8 + 37] + 2;
      a[i * 8 + 2] = b[i * 8 + 37] + 3;
      a[i * 8 + 3] = b[i * 8 + 37] + 4;
      a[i * 8 + 4] = b[i * 8 + 37] + 5;
      a[i * 8 + 5] = b[i * 8 + 37] + 6;
      a[i * 8 + 6] = b[i * 8 + 37] + 7;
      a[i * 8 + 7] = b[i * 8 + 37] + 8;
    }
}

We need to generate the stepped vector:
NPATTERNS = 8.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 }

Before this patch:
vid.v    v4         ;; {0,1,2,3,4,5,6,7,...}
vsrl.vi  v4,v4,3    ;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...}
li       a3,8       ;; {8}
vmul.vx  v4,v4,a3   ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

After this patch:
vid.v    v4                    ;; {0,1,2,3,4,5,6,7,...}
vand.vi  v4,v4,-8(-NPATTERNS)  ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

Case 2:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8] = b[i * 8 + 3] + 1;
      a[i * 8 + 1] = b[i * 8 + 2] + 2;
      a[i * 8 + 2] = b[i * 8 + 1] + 3;
      a[i * 8 + 3] = b[i * 8 + 0] + 4;
      a[i * 8 + 4] = b[i * 8 + 7] + 5;
      a[i * 8 + 5] = b[i * 8 + 6] + 6;
      a[i * 8 + 6] = b[i * 8 + 5] + 7;
      a[i * 8 + 7] = b[i * 8 + 4] + 8;
    }
}

We need to generate the stepped vector:
NPATTERNS = 4.
{ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }

Before this patch:
li       a6,134221824
slli     a6,a6,5
addi     a6,a6,3        ;; 64-bit: 0x0003000200010000
vmv.v.x  v6,a6          ;; {3, 2, 1, 0, ... }
vid.v    v4             ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vsrl.vi  v4,v4,2        ;; {0, 0, 0, 0, 1, 1, 1, 1, ... }
li       a3,4           ;; {4}
vmul.vx  v4,v4,a3       ;; {0, 0, 0, 0, 4, 4, 4, 4, ... }
vadd.vv  v4,v4,v6       ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }

After this patch:
li a3,-536875008
slli a3,a3,4
addi a3,a3,1
slli a3,a3,16
vmv.v.x v2,a3           ;; {3, 1, -1, -3, ... }
vid.v v4              ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vadd.vv v4,v4,v2        ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test.

RISC-V: testsuite: Add -Wno-psabi to vec_set/vec_extract testcases.

This fixes some fallout from the recent psabi changes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Add
-Wno-psabi.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Dito.

RISC-V: Fix compiler warning of riscv_arg_has_vector

Hi,

This little patch fixes a compile warning issue that my previous patch
introduced, sorry for introducing this issue.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_arg_has_vector): Add default
switch handler.

RISC-V: testsuite: Fix vmul test expectation and fix -ffast-math.

I forgot to check for vfmul in the multiplication tests as well as
some -ffast-math arguments. Fix this.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add
-ffast-math.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Remove
-ffast-math
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Check for
vfmul.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito.

Fortran: Fix parse-dump-tree for OpenMP ALLOCATE clause

Commit r14-1301-gd64e8e1224708e added u2.allocator to gfc_omp_namelist
for better readability and to permit to use namelist->expr for code
like the following:
  !$omp allocators allocate(align(32) : dt%alloc_comp)
    allocate (dt%alloc_comp(5))
  !$omp allocate(dt%alloc_comp2) align(64)
    allocate (dt%alloc_comp2(10))
However, for the parse-tree dump the change was incomplete.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_namelist): Fix dump of the allocator
modifier of OMP_LIST_ALLOCATE.

ada: Minor tweaks

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Pass
the NULL_TREE explicitly and test imported_p in lieu of
Is_Imported. <E_Function>: Remove public_flag local variable and
make extern_flag local variable a constant.

ada: Fix crash on inlining in GNATprove

After the recent change on detection of non-inlining, calls inside
the iterator part of a quantified expression were not considered
as preventing inlining anymore, leading to a crash later on inside
GNATprove. Now fixed.

gcc/ada/

* sem_res.adb (Resolve_Call): Fix change that replaced test for
quantified expressions by the test for potentially unevaluated
contexts. Both should be performed.

ada: Further fixes to handling of private views in instances

This removes more bypasses for private views in instances that are present
in type predicates (Conforming_Types, Covers, Specific_Type and Wrong_Type),
which in exchange requires additional work in Sem_Ch12 to restore the proper
view of types during the instantiation of generic bodies.

The main mechanism for this is the Has_Private_View flag, but it comes with
the limitations that 1) there must be a direct reference to the global type
in the generic construct (either a reference to a global object of this type
or the explicit declaration of a local object of this type), which is not
always the case e.g. for loop parameters and 2) it can deal with a single
type at a time, e.g. it cannot deal with an array type and its component
type if their respective views are not the same in the instance.

To overcome the second limitation, a new Has_Secondary_Private_View flag
is introduced to deal with a secondary type, which as of this writing is
either the component type of an array type or the designated type of an
access type (together they make up the vast majority of the problematic
cases for the Has_Private_View flag alone). This new mechanism subsumes
a specific treatment for them that was added in Copy_Generic_Node a few
years ago, although a specific treatment still needs to be preserved for
comparison and equality operators in a narrower case.

Additional handling is also introduced to overcome the first limitation
for loop parameters in Copy_Generic_Node, and a relaxed condition is used
in Exp_Ch7.Convert_View to generate an unchecked conversion between views.

gcc/ada/

* exp_ch7.adb (Convert_View): Detect more cases of mismatches for
private types and use Implementation_Base_Type as main criterion.
* gen_il-fields.ads (Opt_Field_Enum): Add
Has_Secondary_Private_View
* gen_il-gen-gen_nodes.adb (N_Expanded_Name): Likewise.
(N_Direct_Name): Likewise.
(N_Op): Likewise.
* sem_ch12.ads (Check_Private_View): Document the usage of second
flag Has_Secondary_Private_View.
* sem_ch12.adb (Get_Associated_Entity): New function to retrieve
the ultimate associated entity, if any.
(Check_Private_View): Implement Has_Secondary_Private_View
support.
(Copy_Generic_Node): Remove specific treatment for Component_Type
of an array type and Designated_Type of an access type. Add
specific treatment for comparison and equality operators, as well
as iterator and loop parameter specifications.
(Instantiate_Type): Implement Has_Secondary_Private_View support.
(Requires_Delayed_Save): Call Get_Associated_Entity.
(Set_Global_Type): Implement Has_Secondary_Private_View support.
* sem_ch6.adb (Conforming_Types): Remove bypass for private views
in instances.
* sem_type.adb (Covers): Return true if Is_Subtype_Of does so.
Remove bypass for private views in instances.
(Specific_Type): Likewise.
* sem_util.adb (Wrong_Type): Likewise.
* sinfo.ads (Has_Secondary_Private_View): Document new flag.

ada: Remove outdated comment

The Preelaborate pragma the removed comment was referring to was
indeed present in AI 167, as well as in clause 5.3 of the rationale
for Ada 2012, but it never made it into the 2012 version of the
reference manual.

gcc/ada/

* libgnarl/s-mudido.ads: Remove outdated comment.

Fortran's gfc_match_char: %S to match symbol with host_assoc

gfc_match ("... %s ...", ...) matches a gfc_symbol but with
host_assoc = 0. This commit adds '%S' as variant which matches
with host_assoc = 1

gcc/fortran/ChangeLog:

* match.cc (gfc_match_char): Match with '%S' a symbol
with host_assoc = 1.

Improve DSE to handle stores before __builtin_unreachable ()

DSE isn't good at identifying program points that end lifetime
of variables that are not associated with virtual operands. But
at least for those that end basic-blocks we can handle the simple
case where this ending is in the same basic-block as the definition
we want to elide. That should catch quite some common cases already.

* tree-ssa-dse.cc (dse_classify_store): When we found
no defs and the basic-block with the original definition
ends in __builtin_unreachable[_trap] the store is dead.

* gcc.dg/tree-ssa/ssa-dse-47.c: New testcase.
* c-c++-common/asan/pr106558.c: Avoid undefined behavior
due to missing return.

Update virtual SSA form manually where easily possible in phiprop

This keeps virtual SSA form up-to-date in phiprop when easily possible.
Only when we deal with aggregate copies the work would be too
heavy-handed in general.

* tree-ssa-phiprop.cc (phiprop_insert_phi): For simple loads
keep the virtual SSA form up-to-date.

aarch64: Optimise ADDP with same source operands

We've been asked to optimise the testcase in this patch of a 64-bit ADDP with
the low and high halves of the same 128-bit vector. This can be done by a
single .4s ADDP followed by just reading the bottom 64 bits. A splitter for
this is quite straightforward now that all the vec_concat stuff is collapsed
by simplify-rtx.

With this patch we generate a single:
addp v0.4s, v0.4s, v0.4s
instead of:
        dup     d31, v0.d[1]
        addp    v0.2s, v0.2s, v31.2s
        ret

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_addp_same_reg<mode>):
New define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/addp-same-low_1.c: New test.

AArch64: remove test comment from *mov<mode>_aarch64

I accidentally left a test comment in the final version of the patch.
This removes the comment.

gcc/ChangeLog:

* config/aarch64/aarch64.md (*mov<mode>_aarch64): Drop test comment.

ada: Fix couple of issues in documentation of overflow checking

There is still a mention of the defunct CHECKED mode and the Default
Settings paragraph is confusing with regard to the -gnato switch.

gcc/ada/

* doc/gnat_ugn/gnat_and_program_execution.rst (Overflows in GNAT)
<Default Settings>: Remove obsolete paragraph about -gnato.
<Implementation Notes>: Replace CHECKED with STRICT.
* gnat_ugn.texi: Regenerate.

ada: Do not issue warning on postcondition in some cases

Warning on suspicious postcondition is not relevant if contract
Exceptional_Cases is present, or if contract Always_Terminates is
present with a non-statically True value, as in those cases the
postcondition can be used to indicate constraints on those pre-state
for which the subprogram might terminate normally.

gcc/ada/

* sem_util.adb (Check_Result_And_Post_State): Do not warn in cases
where the warning could be spurious.

ada: Add the ability to add error codes to error messages

Add a new character sequence [] for error codes in error messages
handled by Error_Msg procedures, to use for SPARK-related errors.
Display of additional information on the error or warning based on
the error code is delegated to GNATprove.

gcc/ada/

* err_vars.ads (Error_Msg_Code): New variable for error codes.
* errout.adb (Error_Msg_Internal): Display continuation message
when an error code was present.
(Set_Msg_Text): Handle character sequence [] for error codes.
* errout.ads: Document new insertion sequence [].
(Error_Msg_Code): New renaming.
* erroutc.adb (Prescan_Message): Detect presence of error code.
(Set_Msg_Insertion_Code): Handle new insertion sequence [].
* erroutc.ads (Has_Error_Code): New variable for prescan.
(Set_Msg_Insertion_Code): Handle new insertion sequence [].
* contracts.adb (Check_Type_Or_Object_External_Properties):
Replace reference to SPARK RM section by an error code.
* sem_elab.adb (SPARK_Processor): Same.
* sem_prag.adb (Check_Missing_Part_Of): Same.
* sem_res.adb (Resolve_Actuals, Resolve_Entity_Name): Same.

ada: Fix for attribute Range in Exceptional_Cases

Attribute Range is now handled like First and Last when occurring within
the consequence of Exceptional_Cases, i.e. attribute Range is not
considered to be a read of a formal parameter that would not be allowed
in the contract.

gcc/ada/

* sem_res.adb (Resolve_Entity_Name): Handle Range like First and Last.

ada: Document partition-wide Ada signal handlers

Indicate the signal handlers that are set by the Ada
run time, and explain how to prevent them if needed.

gcc/ada/

* doc/gnat_ugn/the_gnat_compilation_model.rst
(Partition-Wide Settings): add this subsection to document
configuration settings made by the Ada run time.
* gnat_ugn.texi: Regenerate.

ada: Fix for quantified expressions in Exceptional_Cases

When detecting illegal uses of formal parameters of the current
subprogram in contract of its Exceptional_Cases, we relied on the
Current_Scope. However, quantified expressions introduce an implicit
scope, which we need to take into account.

gcc/ada/

* sem_res.adb (Resolve_Entity_Name): Ignore implicit loop scopes
introduced by quantified expressions.

ada: Fix bug in predicate checks with address clauses

This patch fixes a compiler bug triggered by having a type with some
defaulted components, and a predicate, and an object of that type with
an address clause. In this case, the compiler was crashing.

gcc/ada/

* sem_ch3.adb (Analyze_Object_Declaration): Remove predicate-check
generation if there is an address clause. These are unnecessary,
and cause gigi to crash.
* exp_util.ads (Following_Address_Clause): Remove obsolete "???"
comments. The suggested changes were done long ago.

ada: Fix fallout of fix to handling of private views in instances

Check_Actual_Type incorrectly switches the view of a private type declared
in the enclosing scope of a generic unit but that has a private ancestor.

gcc/ada/

* einfo.ads (Has_Private_Ancestor): Fix inaccuracy in description.
* sem_ch12.adb (Check_Actual_Type): Do not switch the view of the
type if it has a private ancestor.

ada: Add CHERI intrinsic bindings and helper functions.

The package Interfaces.CHERI provides intrinsic bindings and
helper functions to allow software to query, create, and
manipulate CHERI capabilities.

gcc/ada/

* libgnat/i-cheri.ads: Add CHERI intrinsics and helper functions.
* libgnat/i-cheri.adb: Likewise

ada: Small fixes to handling of private views in instances

The main change is the removal of the special bypass for private views in
Resolve_Implicit_Dereference, which in exchange requires additional work
in Check_Generic_Actuals and a couple more calls to Set_Global_Type in
Save_References_In_Identifier. This also removes an unused parameter in
Convert_View and adds a missing comment in Build_Derived_Record_Type.

gcc/ada/

* exp_ch7.adb (Convert_View): Remove Ind parameter and adjust.
* sem_ch12.adb (Check_Generic_Actuals): Check the type of both in
and in out actual objects, as well as the type of formal parameters
of actual subprograms. Extend the condition under which the views
are swapped to nested generic constructs.
(Save_References_In_Identifier): Call Set_Global_Type on a global
identifier rewritten as an explicit dereference, either directly
or after having first been rewritten as a function call.
(Save_References_In_Operator): Set N2 unconditionally and reuse it.
* sem_ch3.adb (Build_Derived_Record_Type): Add missing comment.
* sem_res.adb (Resolve_Implicit_Dereference): Remove special bypass
for private views in instances.

ada: Fix internal error on aggregate within container aggregate

This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.

gcc/ada/

* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
(Expand_Array_Aggregate): Do not delay the expansion if the parent
node is a container aggregate.

ada: Fix -fdiagnostics-format=json not printing all messages

The previous version of this code stopped printing messages as soon as
it encountered a deleted or continuation message. This was wrong,
continuation and deleted messages can be followed by live messages that
do need to be printed.

gcc/ada/

* errout.adb (Output_Messages): Fix loop termination condition.

ada: Introduce -gnateH switch to force reverse Bit_Order threshold to 64

This can be helpful for legacy code that still makes use of an original
reverse Bit_Order clause, i.e. without a Scalar_Storage_Order clause.

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler
Switches): Document -gnateH.
* opt.ads (Reverse_Bit_Order_Threshold): New variable.
* sem_ch13.adb (Adjust_Record_For_Reverse_Bit_Order): Use its value
if it is nonnegative instead of System_Max_Integer_Size.
* switch-c.adb (Scan_Front_End_Switches): Deal with -gnateH.
* usage.adb (Usage): Print -gnateH.
* gnat_ugn.texi: Regenerate.

ada: Update annotations in runtime for proof

With bump of stable SPARK used for proof of the runtime,
some annotations need to change.

gcc/ada/

* libgnat/s-aridou.adb (Scaled_Divide): Add assertions.
* libgnat/s-valuti.adb: Add Loop_Variant.
* libgnat/s-valuti.ads: Add Exceptional_Cases on No_Return
procedure.

ada: Fix type derivation of subtype of derived type

Deriving from a subtype of a derived type of a private type, whose full
view is itself a derived type of a discriminated record with a known
discriminatant was failing with the error message:

invalid constraint: type has no discriminant

The compiler needs to use the full view to be able to constrain the
type.

Also fix minor typo in comments.

gcc/ada/

* sem_ch3.adb (Build_Derived_Record_Type): Use full view as
Parent_Base if needed.

ada: Pass Error_Node to calls to Error_Msg in lib-load.adb

When not passing Error_Node, Error_Msg will treat Current_Node as the
node attached to the message. When this happens in lib-load.adb due to a
file that cannot be loaded, Current_Node might reference a node that
doesn't actually exist. This is a problem when using -gnatdJ and
-fdiagnostics-format, as in this case GNAT will attempt to retrieve
information from the node attached to the message and thus crash when
said node is invalid.

gcc/ada/

* lib-load.adb (Load_Unit): Pass Error_Node to calls to Error_Msg.

ada: Remove references to Might_Not_Return and Always_Return

The Might_Not_Return and Always_Return annotations for GNATprove
should now be replaced by the two more precise aspects
Exceptional_Cases and Always_Terminates.
They allow to specify whether a subprogram is allowed to raise
exceptions or fail to complete.

gcc/ada/

* libgnat/a-strfix.ads: Replace Might_Not_Return annotations by
Exceptional_Cases and Always_Terminates aspects.
* libgnat/a-tideio.ads: Idem.
* libgnat/a-tienio.ads: Idem.
* libgnat/a-tifiio.ads: Idem.
* libgnat/a-tiflio.ads: Idem.
* libgnat/a-tiinio.ads: Idem.
* libgnat/a-timoio.ads: Idem.
* libgnat/a-textio.ads: Idem. Also mark functions Name, Col, Line,
and Page as out of SPARK as they might raise Layout_Error.
* libgnarl/a-reatim.ads: Replace Always_Return annotations by
Always_Terminates aspects.
* libgnat/a-chahan.ads: Idem.
* libgnat/a-nbnbig.ads: Idem.
* libgnat/a-nbnbin.ads: Idem.
* libgnat/a-nbnbre.ads: Idem.
* libgnat/a-ngelfu.ads: Idem.
* libgnat/a-nlelfu.ads: Idem.
* libgnat/a-nllefu.ads: Idem.
* libgnat/a-nselfu.ads: Idem.
* libgnat/a-nuelfu.ads: Idem.
* libgnat/a-strbou.ads: Idem.
* libgnat/a-strmap.ads: Idem.
* libgnat/a-strsea.ads: Idem.
* libgnat/a-strsup.ads: Idem.
* libgnat/a-strunb.ads: Idem.
* libgnat/a-strunb__shared.ads: Idem.
* libgnat/g-souinf.ads: Idem.
* libgnat/i-c.ads: Idem.
* libgnat/interfac.ads: Idem.
* libgnat/interfac__2020.ads: Idem.
* libgnat/s-aridou.adb: Idem.
* libgnat/s-arit32.adb: Idem.
* libgnat/s-atacco.ads: Idem.
* libgnat/s-spcuop.ads: Idem.
* libgnat/s-stoele.ads: Idem.
* libgnat/s-vaispe.ads: Idem.
* libgnat/s-vauspe.ads: Idem.
* libgnat/i-cstrin.ads: Add a precondition instead of a
Might_Not_Return annotation.

ada: Spurious error on package instantiation

The compiler reports spurious errors processing the instantation
of a generic package when the instantation is performed in the
the body of a package that has a private type T, a dispatching
primitive of T has the same name as a component of T, and
an extension of T is used as the actual parameter for a
formal derived type of T in the instantiation.

gcc/ada/

* sem_ch4.adb
(Try_Selected_Component_In_Instance): New subprogram; factorizes
existing code.
(Find_Component_In_Instance) Moved inside the new subprogram.
(Analyze_Selected_Component): Invoke the new subprogram before
trying the Object.Operation notation.

ada: Fix edge case in Ada.Calendar.Formatting.Time_Of

Before this patch, Ada.Calendar.Formatting.Time_Of executed extra code
when passed a number of seconds equal to the number of seconds in a day.
This caused the result to be off, perhaps because a statement resetting
the number of seconds to zero was missing.

Instead of adding such a statement, this patch removes the special
handling of the problematic case, which gives the intended result.

gcc/ada/

* libgnat/a-calfor.adb (Time_Of): Fix handling of special case.

x86: correct and improve "*vec_dupv2di"

The input constraint for the %vmovddup alternative was wrong, as the
upper 16 XMM registers require AVX512VL to be used with this insn. To
compensate, introduce a new alternative permitting all 32 registers, by
broadcasting to the full 512 bits in that case if AVX512VL is not
available.

gcc/

* config/i386/sse.md (vec_dupv2di): Correct %vmovddup input
constraint. Add new AVX512F alternative.

gcc/testsuite/

* gcc.target/i386/avx512f-dupv2di.c: New test.

debug/110295 - mixed up early/late debug for member DIEs

When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members. Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.

RISC-V: Fix fails of testcases

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors)
Excess errors:
xgcc: fatal error: Cannot find suitable multilib set for '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
compilation terminated.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.

RISC-V: Add tuple vector mode psABI checking and simplify code

Hi,

This patch does several things:
  1. Adds the missed checking of tuple vector mode
  2. Extend the scope of checking to all vector types, previously it
     was only for scalable vector types.
  3. Simplify the logic of determining code of vector type which will lower to
     vector tmode  code

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_scalable_vector_type_p): Delete.
(riscv_arg_has_vector): Simplify.
(riscv_pass_in_vector_p): Adjust warning message.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Add -Wno-psabi option.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/vector-abi-1.c: Ditto.
* gcc.target/riscv/vector-abi-2.c: Ditto.
* gcc.target/riscv/vector-abi-3.c: Ditto.
* gcc.target/riscv/vector-abi-4.c: Ditto.
* gcc.target/riscv/vector-abi-5.c: Ditto.
* gcc.target/riscv/vector-abi-6.c: Ditto.
* gcc.target/riscv/vector-abi-7.c: New test.
* gcc.target/riscv/vector-abi-8.c: New test.
* gcc.target/riscv/vector-abi-9.c: New test.

Daily bump.

libcpp: reject codepoints above 0x10FFFF

Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10FFFF.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>

RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.

In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.

Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

gcc/
PR rtl-optimization/110305
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Handle HONOR_SNANS for x + 0.0.

optimize std::max early

we currently produce very bad code on loops using std::vector as a stack, since
we fail to inline push_back which in turn prevents SRA and we fail to optimize
out some store-to-load pairs.

I looked into why this function is not inlined and it is inlined by clang.  We
currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30
at -O3.  Clang has similar estimate, but still decides to inline at -O2.

I looked into reason why the body is so large and one problem I spotted is the
way std::max is implemented by taking and returning reference to the values.

  const T& max( const T& a, const T& b );

This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.

We optimize this to MAX_EXPR, but only during late optimizations.  I think this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA.  The following is easist fix that simply adds phiprop pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
remove the memory stores.

gcc/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.

gcc/testsuite/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.
* gcc.dg/tree-ssa/pr21463.c: Adjust template.

AArch64: convert some patterns to compact MD syntax

Hi All,

This converts some patterns in the AArch64 backend to use the new
compact syntax.

gcc/ChangeLog:

* config/aarch64/aarch64.md (arches): Add nosimd.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Rewrite to
compact syntax.

New compact syntax for insn and insn_split in Machine Descriptions.

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  r,  r,  r, w,r,w, w")
(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
  "(register_operand (operands[0], SImode)
    || aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %1
   #
   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
   adr\\t%x0, %c1
   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1
   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
    && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   "{
       aarch64_expand_mov_immediate (operands[0], operands[1]);
       DONE;
    }"
  ;; The "mov_imm" type for CNT is just a placeholder.
  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
    load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
   (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
   (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "aarch64_mov_operand"))]
  "(register_operand (operands[0], SImode)
    || aarch64_reg_or_zero (operands[1], SImode))"
  {@ [cons: =0, 1; attrs: type, arch, length]
     [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
     [k , r  ; mov_reg  , *   , 4] ^
     [r , k  ; mov_reg  , *   , 4] ^
     [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
     [r , n  ; mov_imm  , *   ,16] #
     /* The "mov_imm" type for CNT is just a placeholder.  */
     [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]);
     [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
     [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
     [m , rZ ; store_4  , *   , 4] str\t%w1, %0
     [m , w  ; store_4  , fp  , 4] str\t%s1, %0
     [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
     [r , Usa; adr      , *   , 4] adr\t%x0, %c1
     [r , Ush; adr      , *   , 4] adrp\t%x0, %A1
     [w , rZ ; f_mcr    , fp  , 4] fmov\t%s0, %w1
     [r , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
     [w , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
     [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
  }
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
    && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
  [(const_int 0)]
  {
    aarch64_expand_mov_immediate (operands[0], operands[1]);
    DONE;
  }
)

The main syntax rules are as follows (See docs for full rules):
  - Template must start with "{@" and end with "}" to use the new syntax.
  - "{@" is followed by a layout in parentheses which is "cons:" followed by
    a list of match_operand/match_scratch IDs, then a semicolon, then the
    same for attributes ("attrs:"). Both sections are optional (so you can
    use only cons, or only attrs, or both), and cons must come before attrs
    if present.
  - Each alternative begins with any amount of whitespace.
  - Following the whitespace is a comma-separated list of constraints and/or
    attributes within brackets [], with sections separated by a semicolon.
  - Following the closing ']' is any amount of whitespace, and then the actual
    asm output.
  - Spaces are allowed in the list (they will simply be removed).
  - All alternatives should be specified: a blank list should be
    "[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if
    you leave certain attributes empty, I have found).
  - The actual constraint string in the match_operand or match_scratch, and
    the attribute string in the set_attr, must be blank or an empty string
    (you can't combine the old and new syntaxes).
  - The common idion * return can be shortened by using <<.
  - Any unexpanded iterators left during processing will result in an error at
    compile time.   If for some reason <> is needed in the output then these
    must be escaped using \.
  - Within an {@ block both multiline and singleline C comments are allowed, but
    when used outside of a C block they must be the only non-whitespace blocks on
    the line
  - Inside an {@ block any unexpanded iterators will result in a compile time
    fault instead of incorrect assembly being generated at runtime.  If the
    literal <> is needed in the output this needs to be escaped with \<\>.
  - This check is not performed inside C blocks (lines starting with *).
  - Instead of copying the previous instruction again in the next pattern, one
    can use ^ to refer to the previous asm string.

This patch works by blindly transforming the new syntax into the old syntax,
so it doesn't do extensive checking. However, it does verify that:
- The correct number of constraints/attributes are specified.
- You haven't mixed old and new syntax.
- The specified operand IDs/attribute names actually exist.
- You don't have duplicate cons

If something goes wrong, it may write invalid constraints/attributes/template
back into the rtx. But this shouldn't matter because error_at will cause the
program to fail on exit anyway.

Because this transformation occurs as early as possible (before patterns are
queued), the rest of the compiler can completely ignore the new syntax and
assume that the old syntax will always be used.

This doesn't seem to have any measurable effect on the runtime of gen*
programs.

gcc/ChangeLog:

* gensupport.cc (class conlist, add_constraints, add_attributes,
skip_spaces, expect_char, preprocess_compact_syntax,
parse_section_layout, parse_section, convert_syntax): New.
(process_rtx): Check for conversion.
* genoutput.cc (process_template): Check for unresolved iterators.
(class data): Add compact_syntax_p.
(gen_insn): Use it.
* gensupport.h (compact_syntax): New.
(hash-set.h): Include.
* doc/md.texi: Document it.

Co-Authored-By: Omar Tahir <Omar.Tahir2@arm.com>

recog: Change return type of predicate functions from int to bool

Also change some internal variables to bool and change return type of
split_all_insns_noflow to void.

gcc/ChangeLog:

* recog.h (check_asm_operands): Change return type from int to bool.
(insn_invalid_p): Ditto.
(verify_changes): Ditto.
(apply_change_group): Ditto.
(constrain_operands): Ditto.
(constrain_operands_cached): Ditto.
(validate_replace_rtx_subexp): Ditto.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(added_clobbers_hard_reg_p): Ditto.
(peep2_regno_dead_p): Ditto.
(peep2_reg_dead_p): Ditto.
(store_data_bypass_p): Ditto.
(if_test_bypass_p): Ditto.
* rtl.h (split_all_insns_noflow): Change
return type from unsigned int to void.
* genemit.cc (output_added_clobbers_hard_reg_p): Change return type
of generated added_clobbers_hard_reg_p from int to bool and adjust
function body accordingly.  Change "used" variable type from
int to bool.
* recog.cc (check_asm_operands): Change return type
from int to bool and adjust function body accordingly.
(insn_invalid_p): Ditto.  Change "is_asm" variable to bool.
(verify_changes): Change return type from int to bool.
(apply_change_group): Change return type from int to bool
and adjust function body accordingly.
(validate_replace_rtx_subexp): Change return type from int to bool.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(constrain_operands_cached): Ditto.
(constrain_operands): Ditto.  Change "lose" and "win"
variables type from int to bool.
(split_all_insns_noflow): Change return type from unsigned int
to void and adjust function body accordingly.
(peep2_regno_dead_p): Change return type from int to bool.
(peep2_reg_dead_p): Ditto.
(peep2_find_free_register): Change "success"
variable type from int to bool
(store_data_bypass_p_1): Change return type from int to bool.
(store_data_bypass_p): Ditto.

RISC-V: Fix VWEXTF iterator requirement

gcc/ChangeLog:

* config/riscv/vector-iterators.md: zvfh/zvfhmin depends on the
Zve32f extension.

RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

PR target/110299
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32, VI_ZVE64, VI_ZVE32, VWI, VWI_ZVE64, VWI_ZVE32,
VF_ZVE63 and VF_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus<v_su><mode><vwlmul1>): Removed.
(@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto.
(@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve32>): Ditto.
(@pred_widen_reduc_plus<order><mode><vwlmul1>): Ditto.
(@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto.
(@pred_widen_reduc_plus<v_su><VQI:mode><VHI_LMUL1:mode>): New pattern.
(@pred_widen_reduc_plus<v_su><VHI:mode><VSI_LMUL1:mode>): Ditto.
(@pred_widen_reduc_plus<v_su><VSI:mode><VDI_LMUL1:mode>): Ditto.
(@pred_widen_reduc_plus<order><VHF:mode><VSF_LMUL1:mode>): Ditto.
(@pred_widen_reduc_plus<order><VSF:mode><VDF_LMUL1:mode>): Ditto.

gcc/testsuite/ChangeLog:

PR target/110299
* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.

RISC-V: Bugfix for RVV float reduction in ZVE32/64

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
    return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

PR target/110277
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
ret_mode.
* config/riscv/vector-iterators.md: Add VHF, VSF, VDF,
VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr.
* config/riscv/vector.md (@pred_reduc_<reduc><mode><vlmul1>): Removed.
(@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto.
(@pred_reduc_<reduc><mode><vlmul1_zve32>): Ditto.
(@pred_reduc_plus<order><mode><vlmul1>): Ditto.
(@pred_reduc_plus<order><mode><vlmul1_zve32>): Ditto.
(@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto.
(@pred_reduc_<reduc><VHF:mode><VHF_LMUL1:mode>): New pattern.
(@pred_reduc_<reduc><VSF:mode><VSF_LMUL1:mode>): Ditto.
(@pred_reduc_<reduc><VDF:mode><VDF_LMUL1:mode>): Ditto.
(@pred_reduc_plus<order><VHF:mode><VHF_LMUL1:mode>): Ditto.
(@pred_reduc_plus<order><VSF:mode><VSF_LMUL1:mode>): Ditto.
(@pred_reduc_plus<order><VDF:mode><VDF_LMUL1:mode>): Ditto.

gcc/testsuite/ChangeLog:

PR target/110277
* gcc.target/riscv/rvv/base/pr110277-1.c: New test.
* gcc.target/riscv/rvv/base/pr110277-1.h: New test.
* gcc.target/riscv/rvv/base/pr110277-2.c: New test.
* gcc.target/riscv/rvv/base/pr110277-2.h: New test.

amdgcn: implement vector div and mod libfuncs

Also divmod, but only for scalar modes, for now (because there are no complex
int vectors yet).

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_divmod_libfunc): New function.
(gcn_init_libfuncs): Add div and mod functions for all modes.
Add placeholders for divmod functions.
(TARGET_EXPAND_DIVMOD_LIBFUNC): Define.

libgcc/ChangeLog:

* config/gcn/lib2-divmod-di.c: Reimplement like lib2-divmod.c.
* config/gcn/lib2-divmod.c: Likewise.
* config/gcn/lib2-gcn.h: Add new types and prototypes for all the
new vector libfuncs.
* config/gcn/t-amdgcn: Add new files.
* config/gcn/amdgcn_veclib.h: New file.
* config/gcn/lib2-vec_divmod-di.c: New file.
* config/gcn/lib2-vec_divmod-hi.c: New file.
* config/gcn/lib2-vec_divmod-qi.c: New file.
* config/gcn/lib2-vec_divmod.c: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/predcom-2.c: Avoid vectors on amdgcn.
* gcc.dg/unroll-8.c: Likewise.
* gcc.dg/vect/slp-26.c: Change expected results on amdgdn.
* lib/target-supports.exp
(check_effective_target_vect_int_mod): Add amdgcn.
(check_effective_target_divmod): Likewise.
* gcc.target/gcn/simd-math-3-16.c: New test.
* gcc.target/gcn/simd-math-3-2.c: New test.
* gcc.target/gcn/simd-math-3-32.c: New test.
* gcc.target/gcn/simd-math-3-4.c: New test.
* gcc.target/gcn/simd-math-3-8.c: New test.
* gcc.target/gcn/simd-math-3-char-16.c: New test.
* gcc.target/gcn/simd-math-3-char-2.c: New test.
* gcc.target/gcn/simd-math-3-char-32.c: New test.
* gcc.target/gcn/simd-math-3-char-4.c: New test.
* gcc.target/gcn/simd-math-3-char-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run-16.c: New test.
* gcc.target/gcn/simd-math-3-char-run-2.c: New test.
* gcc.target/gcn/simd-math-3-char-run-32.c: New test.
* gcc.target/gcn/simd-math-3-char-run-4.c: New test.
* gcc.target/gcn/simd-math-3-char-run-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run.c: New test.
* gcc.target/gcn/simd-math-3-char.c: New test.
* gcc.target/gcn/simd-math-3-long-16.c: New test.
* gcc.target/gcn/simd-math-3-long-2.c: New test.
* gcc.target/gcn/simd-math-3-long-32.c: New test.
* gcc.target/gcn/simd-math-3-long-4.c: New test.
* gcc.target/gcn/simd-math-3-long-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run-16.c: New test.
* gcc.target/gcn/simd-math-3-long-run-2.c: New test.
* gcc.target/gcn/simd-math-3-long-run-32.c: New test.
* gcc.target/gcn/simd-math-3-long-run-4.c: New test.
* gcc.target/gcn/simd-math-3-long-run-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run.c: New test.
* gcc.target/gcn/simd-math-3-long.c: New test.
* gcc.target/gcn/simd-math-3-run-16.c: New test.
* gcc.target/gcn/simd-math-3-run-2.c: New test.
* gcc.target/gcn/simd-math-3-run-32.c: New test.
* gcc.target/gcn/simd-math-3-run-4.c: New test.
* gcc.target/gcn/simd-math-3-run-8.c: New test.
* gcc.target/gcn/simd-math-3-run.c: New test.
* gcc.target/gcn/simd-math-3-short-16.c: New test.
* gcc.target/gcn/simd-math-3-short-2.c: New test.
* gcc.target/gcn/simd-math-3-short-32.c: New test.
* gcc.target/gcn/simd-math-3-short-4.c: New test.
* gcc.target/gcn/simd-math-3-short-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run-16.c: New test.
* gcc.target/gcn/simd-math-3-short-run-2.c: New test.
* gcc.target/gcn/simd-math-3-short-run-32.c: New test.
* gcc.target/gcn/simd-math-3-short-run-4.c: New test.
* gcc.target/gcn/simd-math-3-short-run-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run.c: New test.
* gcc.target/gcn/simd-math-3-short.c: New test.
* gcc.target/gcn/simd-math-3.c: New test.
* gcc.target/gcn/simd-math-4-char-run.c: New test.
* gcc.target/gcn/simd-math-4-char.c: New test.
* gcc.target/gcn/simd-math-4-long-run.c: New test.
* gcc.target/gcn/simd-math-4-long.c: New test.
* gcc.target/gcn/simd-math-4-run.c: New test.
* gcc.target/gcn/simd-math-4-short-run.c: New test.
* gcc.target/gcn/simd-math-4-short.c: New test.
* gcc.target/gcn/simd-math-4.c: New test.
* gcc.target/gcn/simd-math-5-16.c: New test.
* gcc.target/gcn/simd-math-5-32.c: New test.
* gcc.target/gcn/simd-math-5-4.c: New test.
* gcc.target/gcn/simd-math-5-8.c: New test.
* gcc.target/gcn/simd-math-5-char-16.c: New test.
* gcc.target/gcn/simd-math-5-char-32.c: New test.
* gcc.target/gcn/simd-math-5-char-4.c: New test.
* gcc.target/gcn/simd-math-5-char-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run-16.c: New test.
* gcc.target/gcn/simd-math-5-char-run-32.c: New test.
* gcc.target/gcn/simd-math-5-char-run-4.c: New test.
* gcc.target/gcn/simd-math-5-char-run-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run.c: New test.
* gcc.target/gcn/simd-math-5-char.c: New test.
* gcc.target/gcn/simd-math-5-long-16.c: New test.
* gcc.target/gcn/simd-math-5-long-32.c: New test.
* gcc.target/gcn/simd-math-5-long-4.c: New test.
* gcc.target/gcn/simd-math-5-long-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run-16.c: New test.
* gcc.target/gcn/simd-math-5-long-run-32.c: New test.
* gcc.target/gcn/simd-math-5-long-run-4.c: New test.
* gcc.target/gcn/simd-math-5-long-run-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run.c: New test.
* gcc.target/gcn/simd-math-5-long.c: New test.
* gcc.target/gcn/simd-math-5-run-16.c: New test.
* gcc.target/gcn/simd-math-5-run-32.c: New test.
* gcc.target/gcn/simd-math-5-run-4.c: New test.
* gcc.target/gcn/simd-math-5-run-8.c: New test.
* gcc.target/gcn/simd-math-5-run.c: New test.
* gcc.target/gcn/simd-math-5-short-16.c: New test.
* gcc.target/gcn/simd-math-5-short-32.c: New test.
* gcc.target/gcn/simd-math-5-short-4.c: New test.
* gcc.target/gcn/simd-math-5-short-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run-16.c: New test.
* gcc.target/gcn/simd-math-5-short-run-32.c: New test.
* gcc.target/gcn/simd-math-5-short-run-4.c: New test.
* gcc.target/gcn/simd-math-5-short-run-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run.c: New test.
* gcc.target/gcn/simd-math-5-short.c: New test.
* gcc.target/gcn/simd-math-5.c: New test.

amdgcn: Delete inactive libfuncs

The HImode libfuncs weren't called and trying to enable them fails because
TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness
isn't known.

libgcc/ChangeLog:

* config/gcn/lib2-gcn.h (QItype, UQItype, HItype, UHItype): Delete.
(__divhi3, __modhi3, __udivhi3, __umodhi3): Delete.
* config/gcn/t-amdgcn: Don't build lib2-divmod-hi.c.
* config/gcn/lib2-divmod-hi.c: Removed.

vect: vectorize via libfuncs

This patch allows vectorization when the libfuncs are defined.

gcc/ChangeLog:

* tree-vect-generic.cc: Include optabs-libfuncs.h.
(get_compute_type): Check optab_libfunc.
* tree-vect-stmts.cc: Include optabs-libfuncs.h.
(vectorizable_operation): Check optab_libfunc.

amdgcn: minimal V64TImode vector support

Just enough support for TImode vectors to exist, load, store, move,
without any real instructions available.

This is primarily for the use of divmodv64di4, which uses TImode to
return a pair of DImode values.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function.
* config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators.
(V_MOV, V_MOV_ALT): Likewise.
(scalar_mode, SCALAR_MODE): Add TImode.
(vnsi, VnSI, vndi, VnDI): Likewise.
(vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV.
(mov<mode>, mov<mode>_unspec): Use V_MOV.
(*mov<mode>_4reg): New insn.
(mov<mode>_exec): New 4reg variant.
(mov<mode>_sgprbase): Likewise.
(reload_in<mode>, reload_out<mode>): Use V_MOV.
(vec_set<mode>): Likewise.
(vec_duplicate<mode><exec>): New 4reg variant.
(vec_extract<mode><scalar_mode>): Likewise.
(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
(vec_extract<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
(vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop): New 4reg variant.
(fold_extract_last_<mode>): Use V_MOV.
(vec_init<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
(vec_init<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
(gather_load<mode><vnsi>, gather<mode>_expr<exec>,
gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>,
gather<mode>_insn_2offsets<exec>): Use V_MOV.
(scatter_store<mode><vnsi>, scatter<mode>_expr<exec_scatter>,
scatter<mode>_insn_1offset<exec_scatter>,
scatter<mode>_insn_1offset_ds<exec_scatter>,
scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
(maskload<mode>di, maskstore<mode>di, mask_gather_load<mode><vnsi>,
mask_scatter_store<mode><vnsi>): Likewise.
* config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p.
(gcn_hard_regno_mode_ok): Likewise.
(GEN_VNM): Add TImode support.
(USE_TI): New macro. Separate TImode operations from non-TImode ones.
(gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode,
V8TImode, and V2TImode.
(print_operand): Add 'J' and 'K' print codes.

Remove -save-temps from tests using -flto

The following removes -save-temps that doesn't seem to have any
good reason from tests that also run with -flto added. That can
cause ltrans files to race with other multilibs tested and I'm
frequently seeing linker complaints that the architecture
doesn't match here.

I'm not sure whether the .ltrans.o files end up in a non gccN/
specific directory or if we end up sharing the same dir for
different multilibs (not sure if it's easily possible to avoid that).

* gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
* gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.

tree-optimization/110298 - CFG cleanup and stale nb_iterations

When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.

PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.

* gcc.dg/torture/pr110298.c: New testcase.

Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'

ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}"

Fix-up for recent commit 01fe115ba7eafebcf97bbac9e157038a003d0c85
"libgomp.c/target-51.c: Accept more error-msg variants in dg-output".

libgomp/
* testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax
error.

simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector

In the testcase for this patch we try to vec_concat the lowpart and highpart of a vector, but the lowpart is expressed as a subreg.
simplify-rtx.cc does not recognise this and combine ends up trying to match:
Trying 7 -> 8:
    7: r93:V2SI=vec_select(r95:V4SI,parallel)
    8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
      REG_DEAD r95:V4SI
      REG_DEAD r93:V2SI
Failed to match this instruction:
(set (reg:V4SI 97)
    (vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
        (vec_select:V2SI (reg/v:V4SI 95 [ a ])
            (parallel:V4SI [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                ]))))

This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a simplification.
The testcase is a bit artificial, but I do have other aarch64-specific patterns that I want to optimise later
that rely on this simplification happening.

Without this patch for the testcase we generate:
foo:
        dup     d31, v0.d[1]
        ins     v0.d[1], v31.d[0]
        ret

whereas we should just not generate anything as the operation is ultimately a no-op.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify vec_concat of lowpart subreg and high part vec_select.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/low-high-combine_1.c: New test.

Doc update: -foffload-options= examples + OpenMP in Fortran intrinsic modules

With LTO, the -O.. flags of the host are passed on to the lto compiler, which
also includes offloading compilers. Therefore, using --foffload-options=-O3 is
misleading as it implies that without the default optimizations are used. Hence,
this flags has now been removed from the usage examples.

The Fortran documentation lists the content (except for API routines) routines
of the intrinsic OpenMP modules OMP_LIB and OMP_LIB_KINDS; this commit adds
two missing named constants and links also to the OpenMP 5.1 and 5.2
OpenMP spec for completeness.

gcc/ChangeLog:

* doc/invoke.texi (-foffload-options): Remove '-O3' from the examples.

gcc/fortran/ChangeLog:

* intrinsic.texi (OpenMP Modules OMP_LIB and OMP_LIB_KINDS): Also
add references to the OpenMP 5.1 and 5.2 spec; add omp_initial_device
and omp_invalid_device named constants.

vect: Restore aarch64 bootstrap

gcc/
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Handle null niters_skip.

Fix build of aarc64

The following fixes a reference to LOOP_VINFO_MASKS array in the
aarch64 backend after my changes.

* config/aarch64/aarch64.cc
(aarch64_vector_costs::analyze_loop_vinfo): Fix reference
to LOOP_VINFO_MASKS.

avr: Fix wrong array bounds warning on SFR access

The warning was raised on accessing SFRs at addresses below the default
page size, as gcc considers accessing addresses in the first page of
memory as suspicious. This doesn't apply to an embedded target like the
avr, where both flash and RAM have zero as a valid address. Zero is also
a valid address in named address spaces (__memx, flash<n> etc..).

This commit implements TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID for the avr
target and reports to gcc that zero is a valid address on all
address spaces. It also disables flag_delete_null_pointer_checks
based on the target hook, and modifies target-supports.exp to add avr
to the list of targets that always keep null pointer checks. This fixes
a bunch of DejaGNU failures that occur otherwise.

PR target/105523

gcc/ChangeLog:

* common/config/avr/avr-common.cc: Remove setting
of OPT_fdelete_null_pointer_checks.
* config/avr/avr.cc (avr_option_override): Clear
flag_delete_null_pointer_checks if zero_address_valid.
(avr_addr_space_zero_address_valid): New function.
(TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Provide target
hook.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_keeps_null_pointer_checks): Add
avr.
* gcc.target/avr/pr105523.c: New test.

VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code: v1 = MEM (...)
  for (int i = 0; i < 4; i++)           v2 = MEM (...)
    a[i] = b[i] + c[i];                 v3 = v1 + v2
                                        MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = comparison):
Code:                                   mask = comparison
  for (int i = 0; i < 4; i++)           v1 = LEN_MASK_LOAD (length = VF, mask)
    if (cond[i])                        v2 = LEN_MASK_LOAD (length = VF, mask)
      a[i] = b[i] + c[i];               v3 = v1 + v2
                                        LEN_MASK_STORE (length = VF, mask, v3)

Case 3 (VLA):
Code:                                   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)           v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
      a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
                                        v3 = v1 + v2
                                        LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)

Case 4 (VLA):
Code:                                   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)           mask = comparison
      if (cond[i])                      v1 = LEN_MASK_LOAD (length = loop_len, mask)
      a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, mask)
                                        v3 = v1 + v2
                                        LEN_MASK_STORE (length = loop_len, mask, v3)

Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
gcc/ChangeLog:

* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

RISC-V: Add autovec FP unary operations.

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (<optab><mode>2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add unops.

RISC-V: Add autovec FP binary operations.

This implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

In order to allow proper _Float16 this patch disables
general _Float16 promotion to float TARGET_ZVFH is defined
similar to TARGET_ZFH or TARGET_ZHINX.

gcc/ChangeLog:

* config/riscv/autovec.md (<optab><mode>3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(enum vxrm_field_enum): Rename this...
(enum fixed_point_rounding_mode): ...to this.
(enum frm_field_enum): Rename this...
(enum floating_point_rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
* lib/target-supports.exp: Add riscv_vector_hw and riscv_zvfh_hw
target selectors.

RISC-V: Add sign-extending variants for vmv.x.s.

When the destination register of a vmv.x.s needs to be sign extended to
XLEN we currently emit an sext insn. Since vmv.x.s performs this
automatically this patch adds two instruction patterns that include
sign_extend for the destination operand.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Add VI_QH iterator.
* config/riscv/autovec-opt.md
(@pred_extract_first_sextdi<mode>): New vmv.x.s pattern
that includes sign extension.
(@pred_extract_first_sextsi<mode>): Dito for SImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Ensure
that no sext insns are present.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.

RISC-V: Implement vec_set and vec_extract.

This implements the vec_set and vec_extract patterns for integer and
floating-point data types. For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.

vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.

The patch does not include vector-vector extraction which
will be done at a later time.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_set<mode>): Implement.
(vec_extract<mode><vel>): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
New test.

RISC-V: Add (u)int8_t to binop tests.

This patch adds the missing (u)int8_t types to the binop tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Adapt for
(u)int8_t.
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Dito.

libgomp.c/target-51.c: Accept more error-msg variants in dg-output

Depending on the details, the testcase can fail with different but
related messages; all of the following all could be observed for this
testcase:

  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found
  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available

Before, the last two were tested for with 'target offload_device' and
'! offload_device', respectively. Now, all three are accepted by matching
'.*' already after 'but' and without distinguishing whether the effective
target is an offload_device or not.

(For completeness, there is a fourth error that follows this pattern:
'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.)

libgomp/

* testsuite/libgomp.c/target-51.c: Accept more error msg variants
as expected dg-output.

AVX512 fully masked vectorization

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).

Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar     128     256     512    512e    512f
    1    9.42   11.32    9.35   11.17   15.13   16.89
    2    5.72    6.53    6.66    6.66    7.62    8.56
    3    4.49    5.10    5.10    5.74    5.08    5.73
    4    4.10    4.33    4.29    5.21    3.79    4.25
    6    3.78    3.85    3.86    4.76    2.54    2.85
    8    3.64    1.89    3.76    4.50    1.92    2.16
   12    3.56    2.21    3.75    4.26    1.26    1.42
   16    3.36    0.83    1.06    4.16    0.95    1.07
   20    3.39    1.42    1.33    4.07    0.75    0.85
   24    3.23    0.66    1.72    4.22    0.62    0.70
   28    3.18    1.09    2.04    4.20    0.54    0.61
   32    3.16    0.47    0.41    0.41    0.47    0.53
   34    3.16    0.67    0.61    0.56    0.44    0.50
   38    3.19    0.95    0.95    0.82    0.40    0.45
   42    3.09    0.58    1.21    1.13    0.36    0.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.

This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).

* tree-vectorizer.h (enum vect_partial_vector_style): New.
(_loop_vec_info::partial_vector_style): Likewise.
(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
(rgroup_controls::compare_type): Add.
(vec_loop_masks): Change from a typedef to auto_vec<>
to a structure.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Adjust.  Convert niters_skip to compare_type.
(vect_set_loop_condition_partial_vectors_avx512): New function
implementing the AVX512 partial vector codegen.
(vect_set_loop_condition): Dispatch to the correct
vect_set_loop_condition_partial_vectors_* function based on
LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_prepare_for_masked_peels): Compute LOOP_VINFO_MASK_SKIP_NITERS
in the original niter type.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
partial_vector_style.
(can_produce_all_loop_masks_p): Adjust.
(vect_verify_full_masking): Produce the rgroup_controls vector
here.  Set LOOP_VINFO_PARTIAL_VECTORS_STYLE on success.
(vect_verify_full_masking_avx512): New function implementing
verification of AVX512 style masking.
(vect_verify_loop_lens): Set LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_analyze_loop_2): Also try AVX512 style masking.
Adjust condition.
(vect_estimate_min_profitable_iters): Implement AVX512 style
mask producing cost.
(vect_record_loop_mask): Do not build the rgroup_controls
vector here but record masks in a hash-set.
(vect_get_loop_mask): Implement AVX512 style mask query,
complementing the existing while_ult style.

Add loop_vinfo argument to vect_get_loop_mask

This adds a loop_vinfo argument for future use, making the next
patch smaller.

* tree-vectorizer.h (vect_get_loop_mask): Add loop_vec_info
argument.
* tree-vect-loop.cc (vect_get_loop_mask): Likewise.
(vectorize_fold_left_reduction): Adjust.
(vect_transform_reduction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.

OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping [PR110270]

For C/C++ pointers, default implicit mapping firstprivatizes the pointer
but if the memory it points to is mapped, the it is updated to point to
the device memory (by attaching a zero sized array section of the pointed-to
storage).

However, if the pointed-to storage wasn't mapped, the pointer was set to
NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the
pointer retains the on-host address in that case (OpenMP 5.2 semantic).

The new semantic avoids an explicit map/firstprivate/is_device_ptr in the
following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.),
explicitly device allocated memory (e.g. omp_target_alloc), and with
(unified) shared memory.
(Note: With (U)SM, mappings still must be tracked, at least when
omp_target_associate_ptr does not fail when passing in two destinct pointers.)

libgomp/

PR middle-end/110270
* target.c (gomp_map_vars_internal): Copy host value instead of NULL
for GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped.
* libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'.
* testsuite/libgomp.c/target-19.c: Update expected value.
* testsuite/libgomp.c++/target-18.C: Likewise.
* testsuite/libgomp.c++/target-19.C: Likewise.
* testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.

avr: Fix ICE on optimize attribute.

This commit fixes an ICE when an optimize attribute changes the prevailing
optimization level.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105069 describes the
same ICE for the sh target, where the fix was to enable save/restore of
target specific options modified via TARGET_OPTIMIZATION_TABLE hook.

For the AVR target, mgas-isr-prologues and -mmain-is-OS_task are those
target specific options. As they enable generation of more optimal code,
this commit adds the Optimization option property to those option records,
and that fixes the ICE.

Regression run shows no regressions, and >100 new PASSes.

PR target/110086

gcc/ChangeLog:

* config/avr/avr.opt (mgas-isr-prologues, mmain-is-OS_task):
Add Optimization option property.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr110086.c: New test.

xtensa: constantsynth: Add new 2-insns synthesis pattern

This patch adds a new 2-instructions constant synthesis pattern:

- A non-negative square value that root can fit into a signed 12-bit:
=> "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax"

Due to the execution cost of the integer multiply instruction (MULL), this
synthesis works only when the 32-bit Integer Multiply Option is configured
and optimize for size is specified.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_constantsynth_2insn):
Add new pattern for the abovementioned case.

xtensa: Remove TARGET_MEMORY_MOVE_COST hook

It used to always return a constant 4, which is same as the default
behavior, but doesn't take into account the effects of secondary
reloads.

Therefore, the implementation of this target hook is removed.

gcc/ChangeLog:

* config/xtensa/xtensa.cc
(TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.

rs6000: Enable const_anchor for 'addi'

There is a functionality as const_anchor in cse.cc.  This const_anchor
supports to generate new constants through adding small gap/offsets to
existing constant.  For example:

void __attribute__ ((noinline)) foo (long long *a)
{
  *a++ = 0x2351847027482577LL;
  *a++ = 0x2351847027482578LL;
}
The second constant (0x2351847027482578LL) can be compated by adding '1'
to the first constant (0x2351847027482577LL).
This is profitable if more than one instructions are need to build the
second constant.

* For rs6000, we can enable this functionality, as the instruction
'addi' is just for this when gap is smaller than 0x8000.

* One potential side effect of this feature:
Comparing with
"r101=0x2351847027482577LL
...
r201=0x2351847027482578LL"
The new r201 will be "r201=r101+1", and then r101 will live longer,
and would increase pressure when allocating registers.
But I feel, this would be acceptable for this const_anchor feature.

With this feature, for GCC source code and SPEC object files, the
significant changes are the improvement that: "addi" vs. "2 or more
insns: lis+or.."; it also exposes some other optimizations
opportunities: like combine/jump2. While the side effect is also
occurring in few cases, but it does not impact overall performance.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_anchors.c: New test.
* gcc.target/powerpc/try_const_anchors_ice.c: New test.

Check SCALAR_INT_MODE_P in try_const_anchors

The const_anchor in cse.cc supports integer constants only.
There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
try_const_anchors.

In the latest code, some non-integer modes are used with const int.
For examples:
"set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
Pattern "(set (strict_low_part (xx)) (const_int xx))" could
be generated in a few ports. For this, the mode may be VOIDmode.

So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
would be needed.

Some discussions in the previous thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html

gcc/ChangeLog:

* cse.cc (try_const_anchors): Check SCALAR_INT_MODE.

Refined 256/512-bit vpacksswb/vpackssdw patterns.

The packing in vpacksswb/vpackssdw is not a simple concat, it's an
interweave from src1 and src2 for every 128 bit(or 64-bit for the
ss_truncate result).

.i.e.

dst[192-255] = ss_truncate (src2[128-255])
dst[128-191] = ss_truncate (src1[128-255])
dst[64-127] = ss_truncate (src2[0-127])
dst[0-63] = ss_truncate (src1[0-127]

The patch refined those patterns with an extra vec_select for the
interweave.

gcc/ChangeLog:

PR target/110235
* config/i386/sse.md (<sse2_avx2>_packsswb<mask_name>):
Substitute with ..
(sse2_packsswb<mask_name>): .. this, ..
(avx2_packsswb<mask_name>): .. this and ..
(avx512bw_packsswb<mask_name>): .. this.
(<sse2_avx2>_packssdw<mask_name>): Substitute with ..
(sse2_packssdw<mask_name>): .. this, ..
(avx2_packssdw<mask_name>): .. this and ..
(avx512bw_packssdw<mask_name>): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-vpackssdw-3.c: New test.
* gcc.target/i386/avx512bw-vpacksswb-3.c: New test.

Reimplement packuswb/packusdw with UNSPEC_US_TRUNCATE instead of original us_truncate.

packuswb/packusdw does unsigned saturation for signed source, but rtl
us_truncate means does unsigned saturation for unsigned source.
So for value -1, packuswb will produce 0, but us_truncate produces
255. The patch reimplement those related patterns and functions with
UNSPEC_US_TRUNCATE instead of us_truncate.

gcc/ChangeLog:

PR target/110235
* config/i386/i386-expand.cc (ix86_split_mmx_pack): Use
UNSPEC_US_TRUNCATE instead of original us_truncate for
packusdw/packuswb.
* config/i386/mmx.md (mmx_pack<s_trunsuffix>swb): Substitute
with ..
(mmx_packsswb): .. this and ..
(mmx_packuswb): .. this.
(mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original
us_truncate.
(s_trunsuffix): Removed code iterator.
(any_s_truncate): Ditto.
* config/i386/sse.md (<sse2_avx2>_packuswb<mask_name>): Use
UNSPEC_US_TRUNCATE instead of original us_truncate.
(<sse4_1_avx2>_packusdw<mask_name>): Ditto.
* config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum.

Daily bump.

RISC-V: Fix one typo for reduc expand GET_MODE_CLASS

This patch would like to fix one typo when GET_MODE_CLASS by mode.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Fix one typo.

Silence warning in gcc.dg/lto/20091013-1_0.c

gcc/testsuite/ChangeLog:

* gcc.dg/lto/20091013-1_0.c: Disable stringop-overread warning.

RTL: Change return type of predicate and callback functions from int to bool

gcc/ChangeLog:

* rtl.h (*rtx_equal_p_callback_function):
Change return type from int to bool.
(rtx_equal_p): Ditto.
(*hash_rtx_callback_function): Ditto.
* rtl.cc (rtx_equal_p): Change return type from int to bool
and adjust function body accordingly.
* early-remat.cc (scratch_equal): Ditto.
* sel-sched-ir.cc (skip_unspecs_callback): Ditto.
(hash_with_unspec_callback): Ditto.

PR modula2/110284 Remove stor-layout.o and backend header files

This patch removes stor-layout.o from the front end and also removes
back end header files from gcc-consolidation.h.

gcc/m2/ChangeLog:

PR modula2/110284
* Make-lang.in (m2_OBJS): Assign $(GM2_C_OBJS).
(GM2_C_OBJS): Remove m2/stor-layout.o.
(m2/stor-layout.o): Remove rule.
* gm2-gcc/gcc-consolidation.h (rtl.h): Remove include.
(df.h): Remove include.
(except.h): Remove include.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Fix arc assumption that insns are not re-recognized

Testing the V2 version of Manolis's fold-mem-offsets patch exposed a minor bug
in the arc backend.

The movsf_insn pattern has constraints which allow storing certain constants
to memory.  reload/lra will target those alternatives under the right
circumstances.  However the insn's condition requires that one of the two
operands must be a register.

Thus if a pass were to force re-recognition of the pattern we can get an
unrecognized insn failure.

This patch adjusts the conditions to more closely match movsi_insn.  More
specifically it allows storing a constant into a limited set of memory
operands (as defined by the Usc constraint).  movqi_insn has the same
core problem and gets the same solution.

Committed after the tester validated there are not regresisons

gcc/
* config/arc/arc.md (movqi_insn): Allow certain constants to
be stored into memory in the pattern's condition.
(movsf_insn): Similarly.

Analyze SRA candidates in ipa-fnsummary

this patch extends ipa-fnsummary to anticipate statements that will be removed
by SRA.  This is done by looking for calls passing addresses of automatic
variables.  In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.

This is just first step which is overly optimistic.  We do not try to prove that
given automatic variable will not be SRAed even after inlining.  We now also
optimistically assume that the transformation will always happen.  I will restrict
this in a followup patch, but I think it is useful to gether some data on how
much code is affected by this.

This is motivated by PR109849 where we fail to fully inline push_back.
The patch alone does not solve the problem even for -O3, but improves
analysis in this case.

gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new parameter
ES; handle ipa_predicate::not_sra_candidate.
(evaluate_properties_for_edge): Pass es to
evaluate_conditions_for_known_args.
(ipa_fn_summary_t::duplicate): Handle sra candidates.
(dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
(load_or_store_of_ptr_parameter): New function.
(points_to_possible_sra_candidate_p): New function.
(analyze_function_body): Initialize points_to_possible_sra_candidate;
determine sra predicates.
(estimate_ipcp_clone_size_and_time): Update call of
evaluate_conditions_for_known_args.
(remap_edge_params): Update points_to_possible_sra_candidate.
(read_ipa_call_summary): Stream points_to_possible_sra_candidate
(write_ipa_call_summary): Likewise.
* ipa-predicate.cc (ipa_predicate::add_clause): Handle not_sra_candidate.
(dump_condition): Dump it.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_possible_sra_candidate.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* g++.dg/ipa/devirt-45.C: Update template.

i386: Refactor new ix86_expand_carry to set the carry flag.

This patch refactors the three places in the i386.md backend that we
set the carry flag into a new ix86_expand_carry helper function, that
allows Jakub's recently added uaddc<mode>5 and usubc<mode>5 expanders
to take advantage of the recently added support for the stc instruction.

2023-06-18 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_carry): New helper
function for setting the carry flag.
(ix86_expand_builtin) <handlecarry>: Use it here.
* config/i386/i386-protos.h (ix86_expand_carry): Prototype here.
* config/i386/i386.md (uaddc<mode>5): Use ix86_expand_carry.
(usubc<mode>5): Likewise.

i386: Standardize shift amount constants as QImode in i386.md.

This clean-up improves consistency within i386.md by using QImode for
the constant shift count in patterns that specify a mode.

2023-06-18 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (*concat<mode><dwi>3_1): Use QImode
for the immediate constant shift count.
(*concat<mode><dwi>3_2): Likewise.
(*concat<mode><dwi>3_3): Likewise.
(*concat<mode><dwi>3_4): Likewise.
(*concat<mode><dwi>3_5): Likewise.
(*concat<mode><dwi>3_6): Likewise.

RTL: Merge rtx_equal_p and hash_rtx functions with their callback variants

Use default argument when callback function is not required to merge
rtx_equal_p and hash_rtx functions with their callback variants.

gcc/ChangeLog:

* cse.cc (hash_rtx_cb): Rename to hash_rtx.
(hash_rtx): Remove.
* early-remat.cc (remat_candidate_hasher::equal): Update
to call rtx_equal_p with rtx_equal_p_callback_function argument.
* rtl.cc (rtx_equal_p_cb): Rename to rtx_equal_p.
(rtx_equal_p): Remove.
* rtl.h (rtx_equal_p): Add rtx_equal_p_callback_function
argument with NULL default value.
(rtx_equal_p_cb): Remove function declaration.
(hash_rtx_cb): Ditto.
(hash_rtx): Add hash_rtx_callback_function argument
with NULL default value.
* sel-sched-ir.cc (free_nop_pool): Update function comment.
(skip_unspecs_callback): Ditto.
(vinsn_init): Update to call hash_rtx with
hash_rtx_callback_function argument.
(vinsn_equal_p): Ditto.

RISC-V:Add float16 tuple type support

This patch adds support for the float16 tuple type.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t):
New types.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
* config/riscv/riscv.md: New.
* config/riscv/vector-iterators.md: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.

Daily bump.

[contrib] validate_failures.py: Don't consider summary line in wrong place

When parsing a summary or manifest file, if we're not either after a tool
line (e.g. "=== gdb tests ===") or before a summary line (e.g.,
"=== gdb Summary ===") then the current line can't be a valid result line
so ignore it.

This addresses a problem we're seeing when running the GDB testsuite in
our CI environment where it produces a valid summary file, but then after
the "=== gdb Summary ===" section it outputs a series of Tcl errors that
match _VALID_TEST_RESULTS_REX and thus confuse the parsing logic:

05: 14:32 .sum file seems to be broken: tool="None", exp="None", summary_line="ERROR: -------------------------------------------"
05: 14:32 Traceback (most recent call last):
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 706, in <module>
05: 14:32     retval = Main(sys.argv)
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 697, in Main
05: 14:32     retval = CheckExpectedResults()
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 572, in CheckExpectedResults
05: 14:32     actual = GetResults(sum_files)
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 447, in GetResults
05: 14:32     build_results.update(ParseSummary(sum_fname))
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 389, in ParseSummary
05: 14:32     result = result_set.MakeTestResult(line, ordinal)
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 236, in MakeTestResult
05: 14:32     return TestResult(summary_line, ordinal,
05: 14:32   File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 148, in __init__
05: 14:32     raise

contrib/ChangeLog:

* testsuite-management/validate_failures.py (IsInterestingResult):
Add result_set argument and use it.  Adjust callers.

i386: Two minor tweaks to ix86_expand_move.

This patch splits out two (independent) minor changes to i386-expand.cc's
ix86_expand_move from a larger patch, given that it's better to review
and commit these independent pieces separately from a more complex patch.

The first change is to test for CONST_WIDE_INT_P before calling
ix86_convert_const_wide_int_to_broadcast.  Whilst stepping through
this function in gdb, I was surprised that the code was continually
jumping into this function with operands that obviously weren't
appropriate.

The second change is to generalize the optimization for efficiently
moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit
vector modes.

Hence for the test case:

typedef unsigned long uv2di __attribute__ ((__vector_size__ (16)));
uv2di foo2(__int128 x) { return (uv2di)x; }

we'd previously move via memory with:

foo2:   movq    %rdi, -24(%rsp)
        movq    %rsi, -16(%rsp)
        movdqa  -24(%rsp), %xmm0
        ret

with this patch we now generate with -O2 (the same as V1TImode):

foo2:   movq    %rdi, %xmm0
        movq    %rsi, %xmm1
        punpcklqdq      %xmm1, %xmm0
        ret

and with -O2 -msse4 the even better:

foo2:   movq    %rdi, %xmm0
        pinsrq  $1, %rsi, %xmm0
        ret

The new test case is unimaginatively called sse2-v1ti-mov-2.c given
the original test case just for V1TI mode was called sse2-v1ti-mov-1.c.

2023-06-17  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is
CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast.
Generalize special case for converting TImode to V1TImode to handle
all 128-bit vector conversions.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-2.c: New test case.

gcc-ar: Remove code duplication.

From c3f3b2fd53291805b5d0be19df6d1a348c5889ec Mon Sep 17 00:00:00 2001
From: Costas Argyris <costas.argyris@gmail.com>
Date: Thu, 15 Jun 2023 12:37:35 +0100
Subject: [PATCH] gcc-ar: Remove code duplication.

Preparatory refactoring that simplifies by eliminating
some duplicated code, before trying to fix 77576.
I believe this stands on its own regardless of the PR.
It also saves a nargv element when we have a plugin and
three when not.

gcc/
* gcc-ar.cc (main): Refactor to slightly reduce code
duplication. Avoid unnecessary elements in nargv.

Daily bump.

RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
    return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
    return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
    return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

PR target/110265

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md
(@pred_reduc_<reduc><mode><vlmul1>): Removed.
(@pred_reduc_<reduc><mode><vlmul1_zve64>): Likewise.
(@pred_reduc_<reduc><mode><vlmul1_zve32>): Likewise.
(@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>): New pattern.
(@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.

RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]

This patch fixes this issue happens on both GCC-13 and GCC-14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

The testcase is too big and I failed to reduce it so I didn't append
test into this patch.

This patch should not only land into GCC-14 but also should backport to GCC-13.

PR target/110264

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.

libgo/testsuite: add benchmarks and examples to list

In CL 384695 I simplified the code that built lists of benchmarks,
examples, and fuzz tests, and managed to break it. This CL corrects
the code to once again make the benchmarks available, and to run
the examples with output and the fuzz targets.

Doing this revealed a test failure in internal/fuzz on 32-bit x86:
a signalling NaN is turned into a quiet NaN on the 387 floating-point
stack that GCC uses by default. This CL skips the test.

Fixes golang/go#60826

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/503798

uiltins: Add support for clang compatible __builtin_{add,sub}c{,l,ll} [PR79173]

While the design of these builtins in clang is questionable,
rather than being say
unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
so that it is clear they add two [0, 0xffffffff] range numbers
plus one [0, 1] range carry in and give [0, 0xffffffff] range
return plus [0, 1] range carry out, they actually instead
add 3 [0, 0xffffffff] values together but the carry out
isn't then the expected [0, 2] value because
0xffffffffULL + 0xffffffff + 0xffffffff is 0x2fffffffd,
but just [0, 1] whether there was any overflow at all.

It is something used in the wild and shorter to write than the
corresponding
#define __builtin_addc(a,b,carry_in,carry_out) \
  ({ unsigned _s; \
     unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
     unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
     *(carry_out) = (_c1 | _c2); \
     _s; })
and so a canned builtin for something people could often use.
It isn't that hard to maintain on the GCC side, as we just lower
it to two .ADD_OVERFLOW calls early, and the already committed
pottern recognization code can then make .UADDC/.USUBC calls out of
that if the carry in is in [0, 1] range and the corresponding
optab is supported by the target.

2023-06-16  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/79173
* builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
types.
* builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
* builtins.cc (fold_builtin_addc_subc): New function.
(fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
* doc/extend.texi (__builtin_addc, __builtin_subc): Document.

* gcc.target/i386/pr79173-11.c: New test.
* gcc.dg/builtin-addc-1.c: New test.

tree-ssa-math-opts: Fix up uaddc/usubc pattern matching [PR110271]

The following testcase ICEs, because I misremembered what the return value
from match_arith_overflow is. It isn't true if __builtin_*_overflow was
matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.

So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
be stmt and match_uaddc_usubc will be confused and can ICE.

The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
pattern match it further as UADDC/USUBC.

2023-06-16 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/110271
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
<case PLUS_EXPR>: Ignore return value from match_arith_overflow,
instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.

* gcc.c-torture/compile/pr110271.c: New test.