git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: arm: Relax cbranch tests to accept inverted branches

Similar to PR113502, but for non-aarch64 test.
The test started to fail after r14-7243-gafac1bd3365.

gcc/testsuite/ChangeLog:

* gcc.target/arm/vect-early-break-cbranch.c: Ignore exact
branch.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Update expected asm in armv8_2-fp16-neon-2.c

With the changes in r15-1579-g792f97b44ff, the test_vmul_n_16x8 function
does not contain any vdup.16 q* r* instruction with -mfloat-abi=softfp.

The differnce between r15-1578-g5185274c76c and r15-1579-g792f97b44ff
with -mfloat-abi=softfp for the function is:
        .global test_vmul_n_16x8
        .syntax unified
        .arm
        .type   test_vmul_n_16x8, %function
test_vmul_n_16x8:
        @ args = 4, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vmov    d16, r0, r1  @ v8hf
        vmov    d17, r2, r3
-       ldrh    r3, [sp]        @ __fp16
-       vdup.16 q9, r3
+       vld1.16 {d18[], d19[]}, [sp]
        vmul.f16        q8, q9, q8
        vmov    r0, r1, d16  @ v8hf
        vmov    r2, r3, d17
        bx      lr
        .size   test_vmul_n_16x8, .-test_vmul_n_16x8

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-neon-2.c: Expect 3 vdup.16 q* r*
when in arm_hf_eabi else 2.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

libgccjit: Add count zeroes builtins to ensure_optimization_builtins_exist

gcc/jit/ChangeLog:

* jit-builtins.cc (ensure_optimization_builtins_exist): Add
missing builtins.

ada: Move special case for null string literal from frontend to backend

Previously the lower bound of string literals indexed by non-static
integer types was artificially set to 1 in the frontend. This was to
avoid an overflow in calculation of a null string size by the GCC
backend, which was causing an excessively large binary object file.

However, setting the lower bound to 1 was problematic for GNATprove,
which could not easily retrieve the lower bound of string literals.

This patch avoids the overflow in GCC by recognizing null string literal
subtypes in Gigi.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Recognize null
string literal subtypes and set their bounds to 1 .. 0.

ada: Remove special case for the size of a string literal subtype

Apparently we no longer need to ignore string literal subtypes case
when validating size of a type entity.

Code cleanup; behavior appears to be unaffected.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Remove special
case for string literal subtypes.

ada: Fix ancient typo in process_decls

It has gone unnoticed for decades because it changes nothing in practice.

gcc/ada/ChangeLog:

* gcc-interface/trans.cc (process_decls): Remove tests on Nkind that
contain a typo and would be redundant if written correctly.

ada: Split Library_Unit using multiple wrappers

The Library_Unit field was used for all sorts of different purposes,
which led to confusing code.

This patch splits Library_Unit into much more specific wrapper
subprograms that should be called instead of [Set_]Library_Unit.
Predicates and pragmas Assert are used to catch misuses of these.
We document the semantics, especially "surprising" cases (e.g.
internally-generated with clauses can refer to package bodies).

This change does not fix gigi, codepeer, spark, or llvm
to use the new wrappers; so far, they are used only in
the GNAT front end.

gcc/ada/ChangeLog:

* sinfo.ads (Library_Unit): Rewrite documentation. Note that
the "??? not (always) true..." comment was not true;
the Subunit_Parent never points to the spec.
(N_Compilation_Unit): Improve documentation. The Aux_ node
was not created to solve the mentioned problems; it was
created because the size of nodes was limited.
Misc doc improvements.
* sinfo-utils.ads: Add new wrappers for Library_Unit field.
Use subtypes with predicates for the parameters.
(First_Real_Statement): Still used in codepeer.
* sinfo-utils.adb: Add new wrappers for Library_Unit field,
with suitable assertions.
* sem_prag.adb: Use new field wrapper names.
(Matching_Name): New name for Same_Name to avoid
potential confusion with the other function with the
same name (Sem_Util.Same_Name), which is also called
in this same file.
(Matching_Convention): Change Same_Convention to match
Matching_Name.
* sem_util.ads (Same_Name): Improve comments; the old comment
implied that it works for all names, which was not true.
* sem_util.adb: Use new field wrapper names.
* gen_il-gen.adb: Rename N_Unit_Body to be N_Lib_Unit_Body.
Plain "unit" is ambiguous in Ada (library unit, compilation
unit, program unit, etc).
Add new union types N_Lib_Unit_Declaration and
N_Lib_Unit_Renaming_Declaration.
* gen_il-gen-gen_nodes.adb (Compute_Ranges): Raise exception
earlier (it is already raised later, in Verify_Type_Table).
Add a comment explaining why it might be raised.
* gen_il-types.ads: Rename N_Unit_Body to be N_Lib_Unit_Body, and add
new N_Lib_Unit_Declaration and N_Lib_Unit_Renaming_Declaration.
* einfo.ads: Fix obsolete comment (was left over from before
the "variable-sized nodes").
* exp_ch7.adb: Use new field wrapper names.
* exp_disp.adb: Use new field wrapper names.
* exp_unst.adb: Use new field wrapper names.
* exp_util.adb: Use new field wrapper names.
* fe.h: Add new field wrapper names. These are currently not
used in gigi, but this change prepares for using them in
gigi.
* inline.adb: Use new field wrapper names.
* lib.adb: Use new field wrapper names.
Comment improvements.
* lib-load.adb: Use new field wrapper names.
Minor cleanup.
* lib-writ.adb: Use new field wrapper names.
* live.adb: Use new field wrapper names.
* par-load.adb: Use new field wrapper names.
Comment improvements. Minor cleanup.
* rtsfind.adb: Use new field wrapper names.
* sem.adb: Use new field wrapper names.
* sem_ch10.adb: Use new field wrapper names.
Comment improvements. Minor cleanup.
* sem_ch12.adb: Use new field wrapper names.
* sem_ch7.adb: Use new field wrapper names.
* sem_ch8.adb: Use new field wrapper names.
* sem_elab.adb: Use new field wrapper names.
Comment improvements.
* errout.adb (Output_Source_Line): Fix blowup in some
obscure cases, where List_Pragmas is not fully set up.

ada: Improve Unbounded_Wide_String performance

Improve performance of iteration using Element function.
Improve performance of Append.

gcc/ada/ChangeLog:

* libgnat/a-stwiun__shared.adb: Restructure code to inline only
the most common cases. Remove whenever possible runtime checks.
* libgnat/a-stwiun__shared.ads: Add Inline => True to Append
variants and Element.

ada: Improve performance of Unbounded_Wide_Wide_String

Improve performance of iteration using Element function.
Improve performance of Append.

gcc/ada/ChangeLog:

* libgnat/a-stzunb__shared.adb: Restructure code to inline only
the most common cases. Remove whenever possible runtime checks.
* libgnat/a-stzunb__shared.ads: Add Inline => True to Append
variants and Element.

ada: Improve Unbounded_String performance

Improve performance of iteration using Element function.
Improve performance of Append.

gcc/ada/ChangeLog:

* libgnat/a-strunb__shared.adb: Restructure code to inline only
the most common cases. Remove whenever possible runtime checks.
* libgnat/a-strunb__shared.ads: Add Inline => True to Append
variants and Element.

ada: Initial implementation of Extended_Access aspect (FE portion only)

The Extended_Access aspect can be specified to be True for certain
access-to-unconstrained-array-subtype types. Such extended access types
can designate objects that a normal general access type (with the same
designated subtype) cannot, such as a slice of an aliased array object
or an object that is represented without contiguous bounds information.

gcc/ada/ChangeLog:

* aspects.ads: Add Aspect_Extended_Access to Aspect_Id
enumeration.
* par-prag.adb: Add Pragma_Extended_Access to list of pragmas that
get no interesting processing in the parser.
* sem_attr.adb: Relax legality checks on Access/Unchecked_Access
attribute references if access type is Extended_Access.
* sem_ch12.adb (Validate_Access_Type_Instance): For an instance of
a generic with a formal access type, check that formal and actual
agree with with respect to Extended_Access aspect.
* sem_prag.adb (Analyze_Pragma): Add analysis code for pragma
Extended_Access. Set Pragma_Extended_Access element in Sig_Flags
aggregate.
* sem_prag.ads: Set Pragma_Extended_Access element in
Aspect_Specifying_Pragma aggregate.
* sem_res.adb (Valid_Conversion): Disallow
extended-to-not-extended access conversion.
* sem_util.adb (Is_Extended_Access_Access_Type): Implement new
function.
(Is_Aliased_View): If (and only if) the new Boolean For_Extended
parameter is True, then a slice of an aliased non-bitpacked array
is aliased, a constrained nominal subtype does not force a result
of False, and a dereference of an extended access value is
aliased. The last point is somewhat subtle. This is how we prevent
covert fat-to-nonfat type conversions via things like
"Not_Extended_Type'(Extended_Ptr.all'Access)" or passing
Extended_Ptr.all as an actual parameter corresponding to an
explicitly aliased formal parameter.
* sem_util.ads (Is_Extended_Access_Type): Declare new function.
(Is_Aliased_View): Add new defaults-False parameter For_Extended.
* snames.ads-tmpl: Declare Name_Extended_Access Name_Id constant
and Pragma_Extended_Access Pragma_Id enumeration literal.

ada: Avoid unused with warning with Extend_System

When the Extend_System pragma is used then we are supposed
to check the extended system for referenced entities. Otherwise
we would get an incorrect unused with warning.

This was previously done on body files but it should also be
done specs as well.

gcc/ada/ChangeLog:

* sem_warn.adb (Check_One_Unit): When a system extension is
present always check entities from that unit before marking
the unit unreferenced.

ada: Fix crash on default value with nested iterated component associations

The problem is that the freeze node for the type of the element ends up in
the component list of the record type declared with the default value.

gcc/ada/ChangeLog:

PR ada/113036
* freeze.adb (Freeze_Expression): Deal with freezing actions coming
from within nested internal loops present in spec expressions.

ada: Propagate resolution status from Resolve_Iterated_Component_Association

The resolution status of Resolve_Aggr_Expr is lost when the routine is
invoked indirectly from Resolve_Iterated_Component_Association.

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Iterated_Component_Association): Change to
function returning Boolean and return the result of the call made
to Resolve_Aggr_Expr.
(Resolve_Array_Aggregate): Return failure status if the call to
Resolve_Iterated_Component_Association returns false.

ada: Remove dead code in Resolve_Iterated_Component_Association

It dates back to when analysis was performed on a copy of the expression.

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Iterated_Component_Association): Move up
declaration of Expr and remove dead code from older processing.

ada: Update documentation for -gnatVxx switches

Imporve the wording to explicitly state which options are turned on
by -gnatVa and that -gnatVd is enabled by default.

It can be somewhat hard to decifer that information from the old
wording. Especially when compared to -gnatWxx switches where there
is an elaborate scheme for describing those properties.

gcc/ada/ChangeLog:

* usage.adb: Update the wording for -gnatVa and -gnatVd.

ada: Tweak description of new predicate

The existing comment is a bit too vague.

gcc/ada/ChangeLog:

* exp_aggr.ads (Is_Two_Pass_Aggregate): Beef up comment.

ada: Display message on reproducer generation failure

Before this patch, nothing was reported to the user when an exception
was raised during generation of a minimal reproducer. This patch fixes
this.

gcc/ada/ChangeLog:

* comperr.adb (Compiler_Abort): Display message in exception handler.

ada: Missing runtime check in interpolated string

When the type imposed by the context for an interpolated string is
constrained, the compiler silently omits adding a runtime check.

gcc/ada/ChangeLog:

* exp_ch2.adb (Expand_N_Interpolated_String_Literal): Use the
base type of the type imposed by the context for building the
interpolated string image; required to allow the expander adding
the missing runtime check when the target type is constrained.
(Apply_Static_Length_Check): New subprogram.

ada: Add CHERI variant of full secondary stack allocator

gcc/ada/ChangeLog:

* Makefile.rtl: Use s-secsta__cheri.adb on Morello CheriBSD.
* libgnat/s-secsta__cheri.adb: New file.

ada: CheriBSD: add SIGPROT handler

gcc/ada/ChangeLog:

* libgnarl/s-intman__cheribsd.adb: Add SIGPROT to interrupt list.

ada: Add Invocation node to the SARIF report

Add an invocation node to the SARIF report that contains the
command line use to activate gnat and whether the execution was
successful or not.

gcc/ada/ChangeLog:

* diagnostics-sarif_emitter.adb (Print_Runs): Add printing for
the invocation node that consists of a single invocations that
is composed of the commandLine and executionSuccessful attributes.

ada: Add Schema to the SARIF report

gcc/ada/ChangeLog:

* diagnostics-sarif_emitter.adb (Print_SARIF_Report): Add a
Schema field to the SARIF report.

ada: Tweak CPU affinity handling

The primary motivation for this change is making the taskset command
line tool work as expected for tasking programs that don't use features
from section D.16 of the Ada reference manual. A couple of components
are added to the ATCB record to make it possible to tell values that
come from explicit aspects and subprogram calls from values that are
inherited from activating tasks.

gcc/ada/ChangeLog:

* libgnarl/s-mudido__affinity.adb (Unchecked_Set_Affinity): Set new
ATCB component.
* libgnarl/s-taprop__linux.adb (Create_Task): Only set CPU affinity
when required.
(Requires_Affinity_Change): New subprogram.
(Set_Task_Affinity): Likewise.
* libgnarl/s-tarest.adb (Create_Restricted_Task): Adapt to
Initialize_ATCB change.
* libgnarl/s-taskin.adb (Initialize_ATCB): Update parameter list.
Record whether aspects were explicitly specified.
* libgnarl/s-taskin.ads (Common_ATCB): Add component.
* libgnarl/s-tassta.adb (Create_Task): Update call to Initialize_ATCB.
* libgnarl/s-tporft.adb (Register_Foreign_Thread): Likewise.

ada: Fix error message for pragma First_Controlling_Parameter

gcc/ada/ChangeLog:

* sem_prag.adb (Analyze_Pragma): Fix format for second line of
warning (should be a continuation line)

ada: Build and runtime support for CheriBSD

SIGPROT is a new signal on CheriBSD that signals a CHERI protection violation.
The full runtime converts these to the appropriate Ada exception declared in
Interfaces.CHERI.Exceptions.

gcc/ada/ChangeLog:

* Makefile.rtl: Build support for Morello CheriBSD.
* libgnarl/s-intman__cheribsd.adb: New file for CheriBSD.
* libgnarl/s-osinte__cheribsd.ads: New file for CheriBSD.

ada: Refactor exception declarations from Interfaces.CHERI to separate package

Exception declarations require elaboration on the full run-time to
register the exceptions. The package Interfaces.CHERI, however, is
used on bare-metal targets during early initialization, before
elaboration and is therefore marked No_Elaboration_Code_All.
Refactoring the exception declarations to a separate package allows
the common CHERI bindings to be used in such contexts.

gcc/ada/ChangeLog:

* libgnat/i-cheri.ads: Remove exception declarations.
* libgnat/i-cheri-exceptions.ads: New file.

ada: Fix alignment of pthread_mutex_t

On most targets the alignment of unsigned long is the same as pointer
alignment, but on CHERI targets pointers have larger alignment (16 bytes
compared to 8 bytes). pthread_mutex_t needs the same alignment as
System.Address to account for CHERI targets.

gcc/ada/ChangeLog:

* libgnat/s-oslock__posix.ads: Fix alignment of pthread_mutex_t
for CHERI targets.

ada: Move formal hash tables from gnat repository to the SPARK library

The formal containers have been part of the SPARK library for some
time now. However, some units used only by these containers are still
part of the gnat repository. Move them to the SPARK library.

gcc/ada/ChangeLog:

* Makefile.rtl: Remove references to moved units.
* libgnat/a-chtgfk.adb: Removed.
* libgnat/a-chtgfk.ads: Removed.
* libgnat/a-chtgfo.adb: Removed.
* libgnat/a-chtgfo.ads: Removed.
* libgnat/a-cohata.ads (Generic_Formal_Hash_Table_Types): Removed.

ada: Add doc for deep delta aggregates

gcc/ada/ChangeLog:

* doc/gnat_rm/gnat_language_extensions.rst: Adjust documentation.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Fix internal error on alignment clause for type declared in generic unit

The front-end raises Program_Error on an alignment clause for a type in a
generic unit that references the alignment of another type in the unit.

gcc/ada/ChangeLog:

PR ada/117051
* freeze.adb (Freeze_Entity): Call the layout procedure on subtypes
declared in a generic unit when they are static.

ada: Minor tweaks in comments

They are related to the special support for text encoding on Windows.

gcc/ada/ChangeLog:

* adaint.c: Replace initialize.c with rtinit.c in comment.
* sysdep.c (__gnat_set_mode): Fix reference in comment.
* libgnat/i-cstrea.ads (Content_Encoding): Adjust comment.

ada: Correction to disable self-referential with_clauses

Follow-on to previous change "Disable self-referential with_clauses",
which caused some regressions. Remove useless use clauses referring
to useless self-referential with'ed packages. This is necessary
because in some cases, such use clauses cause the compiler to
crash or give spurious errors.

In addition, enable the warning on self-referential with_clauses.

gcc/ada/ChangeLog:

* sem_ch10.adb (Analyze_With_Clause): In the case of a
self-referential with clause, if there is a subsequent use clause
for the same package (which is necessarily useless), remove it from
the context clause. Reenable the warning.

ada: Missing precondition runtime check in inherited primitive

When a derived tagged type implements interface types in addition
to deriving from its parent type, and a primitive inherited from
its parent type corresponds to an inherited primitive that has
class-wide preconditions, then the generated code fails to check
the class-wide preconditions inherited from the interface primitive.

gcc/ada/ChangeLog:

* einfo.ads (Is_Dispatch_Table_Wrapper): Complete documentation.
* exp_ch6.adb (Install_Class_Preconditions_Check): Dispatch table
wrappers do not require installing the check since it is performed
by the caller.
(Class_Preconditions_Subprogram): Use new predicate Is_LSP_Wrapper.
* freeze.adb (Check_Inherited_Conditions): Rename Postcond_Wrappers to
Condition_Wrappers to handle implicitly inherited subprograms that
implement pre-/postconditions inherited from interface primitives.
Use new predicate Is_LSP_Wrapper.
* sem_disp.adb (Check_Dispatching_Operation): Complete assertion to
handle functions returning class-wide types.
* exp_util.ads (Is_LSP_Wrapper): New subprogram.
* exp_util.adb (Is_LSP_Wrapper): New subprogram.
* contracts.adb (Process_Spec_Postconditions): Use Is_LSP_Wrapper.
(Process_Inherited_Conditions): Use Is_LSP_Wrapper.
* sem_ch6.adb (New_Overloaded_Entity): Use Is_LSP_Wrapper.
* sem_util.adb (Nearest_Class_Condition_Subprogram): Use Is_LSP_Wrapper.

ada: Fix visibility of Taft amendment types

When uninstalling private package declarations we must mark Taft
amendment types hidden, just like we mark other types.

Looking at previous revisions of this code, it is quite clear that this
bug comes from a code evolution and marking types should happen in all
ELSE branches of the enclosing IF statement.

gcc/ada/ChangeLog:

* sem_ch7.adb (Uninstall_Declarations): Mark Taft amendment
types like we mark other types declared in private package
declarations.

ada: Minor whitespace tuning

Code cleanup.

gcc/ada/ChangeLog:

* exp_ch4.adb (Expand_N_Op_Multiply): Remove extra whitespace.

ada: Avoid run-time conversion of 0 from Int to Uint

Code cleanup and tiny performance improvement; semantics is unaffected.

gcc/ada/ChangeLog:

* exp_ch4.adb (Expand_N_Op_Subtract): Replace numeric literal
with universal integer constant, just like it is done in
expansion of addition operator.

ada: Assignment local variable only when it is used

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* sem_res.adb (In_Decl): Rename and move local variable where
it is used.

ada: Add null exclusion to avoid run-time checks

By declaring access parameter with non-null qualifier, the compiler
should avoid generating run-time checks in debug builds, resulting in
a tiny performance improvement.

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* sem_res.adb (Type_In_P): Add non-null qualifier.

ada: Resolve intrinsic operators without homonyms

Intrinsic operators are resolved by rewriting into a corresponding
operator from the Standard package. Traversing homonyms just to find the
corresponding operator was not particularly efficient; also, for the
binary "-" it was finding the unary "-".

There appears to be no difference in compiler behavior, but the new code
should be more efficient and finding the correct operator seems to make
more sense.

gcc/ada/ChangeLog:

* sem_res.adb (Resolve_Intrinsic_Operator)
(Resolve_Intrinsic_Unary_Operator): Replace traversals of
homonyms with a direct lookup.

ada: Fix asymmetry in resolution of unary intrinsic operators

Resolution of binary and unary intrinsic operators differed when
expansion was inactive. In particular, this affected GNATprove
handling of Ada.Real_Time."abs" operator. This patch makes unary
resolution behave like binary resolution.

gcc/ada/ChangeLog:

* sem_res.adb (Resolve_Intrinsic_Unary_Operator): Disable when
expansion is inactive.

PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE

This is, in effect, a reapplication of de2bc6a7367aca2eecc925ebb64cfb86998d89f3
fixing the compile-time hog in var-tracking due to calling simplify_rtx
on the two arms of the rotation before detecting the ROTATE.
That is not necessary.

simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when
C1 + C2 == mode-width.  But the transformation is also valid for PLUS and XOR.
Indeed GIMPLE can also do the fold.  Let's teach RTL to do it too.

The motivating testcase for this is in AArch64 intrinsics:

uint64x2_t G2(uint64x2_t a, uint64x2_t b) {
    uint64x2_t c = veorq_u64(a, b);
    return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63));
}

which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but
GCC was failing to detect the rotate operation for two reasons:
1) The combination of the two arms of the expression is done under XOR rather
than IOR that simplify-rtx currently supports.
2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not
detected as the LHS of the two arms we require.

The patch fixes both issues.  The analysis of the two arms of the rotation
expression is factored out into a common helper simplify_rotate_op which is
then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1.

The check-assembly testcase for this is added in the following patch because
it needs some extra AArch64 backend work, but I've added self-tests in this
patch to validate the transformation.

Bootstrapped and tested on aarch64-none-linux-gnu

Signed-off-by: Kyrylo Tkachov <ktachov@nvidia.com>
PR target/117048
* simplify-rtx.cc (extract_ashift_operands_p): Define.
(simplify_rotate_op): Likewise.
(simplify_context::simplify_binary_operation_1): Use the above in
the PLUS, IOR, XOR cases.
(test_vector_rotate): Define.
(test_vector_ops): Use the above.

target: Fix asm codegen for vfpclasss* and vcvtph2* instructions

This only happens when using -masm=intel.

gcc/ChangeLog:
PR target/116725
* config/i386/sse.md: Fix asm generation.

gcc/testsuite/ChangeLog:
PR target/116725
* gcc.target/i386/pr116725.c: Add test using those AVX builtins.

Don't call invert on VARYING.

When all cases go to one label and resul in a VARYING value, we can't
invert that value to remove all values from the default case. Simply
check for this case and set the default to UNDEFINED.

PR tree-optimization/117398
gcc/
* gimple-range-edge.cc (gimple_outgoing_range::calc_switch_ranges):
Check for VARYING and don't call invert () on it.

gcc/testsuite/
* gcc.dg/pr117398.c: New.

Revert "PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE"

This reverts commit de2bc6a7367aca2eecc925ebb64cfb86998d89f3.

aarch64: Fix incorrect LS64 documentation

As Yuta Mukai pointed out, the manual wrongly said that LS64 is
enabled by default for Armv8.7-A and above, and for Armv9.2-A
and above. LS64 is not mandatory at any architecture level
(and the code correctly implemented that).

I think this was a leftover from an early version of the spec.

gcc/
* doc/invoke.texi: Fix documentation of LS64 so that it's
not implied by Armv8.7-A or Armv9.2-A.

aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

This patch adds initial support for FUJITSU-MONAKA CPU.
The cost model will be corrected in the future.

2024-11-04 Yuta Mukai <mukai.yuta@fujitsu.com>

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add fujitsu-monaka.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include fujitsu-monaka tuning model.
* doc/invoke.texi: Document -mcpu=fujitsu-monaka.
* config/aarch64/tuning_models/fujitsu_monaka.h: New file.

libstdc++: Fix up 117406.cc test [PR117406]

Christophe mentioned in bugzilla that the test FAILs on aarch64,
I'm not including <climits> and use INT_MAX.
Apparently during my testing I got it because the test preinclude
-include bits/stdc++.h
and that includes <climits>, dunno why that didn't happen on aarch64.
In any case, either I can add #include <climits>, or because the
test already has #include <limits> I've changed uses of INT_MAX
with std::numeric_limits<int>::max(), that should be the same thing.
But if you prefer
#include <climits>
I can surely add that instead.

2024-11-04 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/117406
* testsuite/26_numerics/headers/cmath/117406.cc: Use
std::numeric_limits<int>::max() instead of INT_MAX.

Move vect_update_inits_of_drs

Move vect_update_inits_of_drs to after setting up the epilog
metadata.

* tree-vect-loop.cc (update_epilogue_loop_vinfo): Update
DR inits after adjusting the epilog metadata.

Preserve ->move_dr behavior when adjusting epilogue info

When update_epilogue_loop_vinfo relates the shared loop DRs with
the epilogue stmts and infos it should not fiddle with how
pattern recognition applied move_dr.

* tree-vect-loop.cc (update_epilogue_loop_vinfo): A DRs
main stmt vinfo dr_aux should refer to a pattern stmt
which is how move_dr sets this up. We shouldn't undo this.

Move updated versioning threshold compute

The following moves computing the combined main + epilogue loop
versioning threshold until we figured the epilogues to use rather
than incrementally updating it with the chance to joust candidates
after the fact.

* tree-vect-loop.cc (vect_analyze_loop): Move lowest_th
compute until after epilogue_vinfos is final.

Add regression test

This is for the latest fix made to Selected_Length_Checks in Checks.

gcc/testsuite
* gnat.dg/specs/array7.ads: New test.

simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)

With recent patch to improve detection of vector rotates at RTL level
combine now tries matching a V8HImode rotate by 8 in the example in the
testcase.  We can teach AArch64 to emit a REV16 instruction for such a rotate
but really this operation corresponds to the RTL code BSWAP, for which we
already have the right patterns.  BSWAP is arguably a simpler representation
than ROTATE here because it has only one operand, so let's teach simplify-rtx
to generate it.

With this patch the testcase now generates the simplest form:
.L2:
        ldr     q31, [x1, x0]
        rev16   v31.16b, v31.16b
        str     q31, [x0, x2]
        add     x0, x0, 16
        cmp     x0, 2048
        bne     .L2

instead of the previous:
.L2:
        ldr     q31, [x1, x0]
        shl     v30.8h, v31.8h, 8
        usra    v30.8h, v31.8h, 8
        str     q30, [x0, x2]
        add     x0, x0, 16
        cmp     x0, 2048
        bne     .L2

IMO ideally the bswap detection would have been done during vectorisation
time and used the expanders for that, but teaching simplify-rtx to do this
transformation is fairly straightforward and, unlike at tree level, we have
the native RTL BSWAP code.  This change is not enough to generate the
equivalent sequence in SVE, but that is something that should be tackled
separately.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI).

gcc/testsuite/

* gcc.target/aarch64/rot_to_bswap.c: New test.

aarch64: Emit XAR for vector rotates where possible

We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
    return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
        movi    v31.4s, 0
        xar     z0.s, z0.s, z31.s, #23
        ret

G2:
        movi    v31.4s, 0
        xar     z0.b, z0.b, z31.b, #5
        ret

instead of the current:
G1:
        shl     v31.4s, v0.4s, 9
        usra    v31.4s, v0.4s, 23
        mov     v0.16b, v31.16b
        ret
G2:
        shl     v31.8b, v0.8b, 3
        usra    v31.8b, v0.8b, 5
        mov     v0.8b, v31.8b
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
generation of XAR sequences when possible.

gcc/testsuite/

* gcc.target/aarch64/rotate_xar_1.c: New test.

aarch64: Optimize vector rotates as vector permutes where possible

Some vector rotate operations can be implemented in a single instruction
rather than using the fallback SHL+USRA sequence.
In particular, when the rotate amount is half the bitwidth of the element
we can use a REV64,REV32,REV16 instruction.
More generally, rotates by a byte amount can be implented using vector
permutes.
This patch adds such a generic routine in expmed.cc called
expand_rotate_as_vec_perm that calculates the required permute indices
and uses the expand_vec_perm_const interface.

On aarch64 this ends up generating the single-instruction sequences above
where possible and can use LDR+TBL sequences too, which are a good choice.

With help from Richard, the routine should be VLA-safe.
However, the only use of expand_rotate_as_vec_perm introduced in this patch
is in aarch64-specific code that for now only handles fixed-width modes.

A runtime aarch64 test is added to ensure the permute indices are not messed
up.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* expmed.h (expand_rotate_as_vec_perm): Declare.
* expmed.cc (expand_rotate_as_vec_perm): Define.
* config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
Declare prototype.
* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
* config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>):
Call the above.

gcc/testsuite/

* gcc.target/aarch64/vec-rot-exec.c: New test.
* gcc.target/aarch64/simd/pr117048_2.c: New test.

PR 117048: aarch64: Add define_insn_and_split for vector ROTATE

The ultimate goal in this PR is to match the XAR pattern that is represented
as a (ROTATE (XOR X Y) VCST) from the ACLE intrinsics code in the testcase.
The first blocker for this was the missing recognition of ROTATE in
simplify-rtx, which is fixed in the previous patch.
The next problem is that once the ROTATE has been matched from the shifts
and orr/xor/plus, it will try to match it in an insn before trying to combine
the XOR into it. But as we don't have a backend pattern for a vector ROTATE
this recog fails and combine does not try the followup XOR+ROTATE combination
which would have succeeded.

This patch solves that by introducing a sort of "scaffolding" pattern for
vector ROTATE, which allows it to be combined into the XAR.
If it fails to be combined into anything the splitter will break it back
down into the SHL+USRA sequence that it would have emitted.
By having this splitter we can special-case some rotate amounts in the future
to emit more specialised instructions e.g. from the REV* family.
This can be done if the ROTATE is not combined into something else.

This optimisation is done in the next patch in the series.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

PR target/117048
* config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>):
New define_insn_and_split.

gcc/testsuite/

PR target/117048
* gcc.target/aarch64/simd/pr117048.c: New test.

aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

The MD pattern for the XAR instruction in SVE2 is currently expressed with
non-canonical RTL by using a ROTATERT code with a constant rotate amount.
Fix it by using the left ROTATE code.  This necessitates splitting out the
expander separately to translate the immediate coming from the intrinsic
from a right-rotate to a left-rotate immediate.

Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
(that can only handle V2DImode operands).  Therefore let's extend the accepted
modes of the SVE2 patternt to include the Advanced SIMD integer modes.

This leads to some tests for the svxar* intrinsics to fail because they now
simplify to a plain EOR when the rotate amount is the width of the element.
This simplification is desirable (EOR instructions have better or equal
throughput than XAR, and they are non-destructive of their input) so the
tests are adjusted.

For V2DImode XAR operations we should prefer the Advanced SIMD version when
it is available (TARGET_SHA3) because it is non-destructive, so restrict the
SVE2 pattern accordingly.  Tests are added to confirm this.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>):
Use SVE_ASIMD_FULL_I modes.  Use ROTATE code for the rotate step.
Adjust output logic.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define.
(svxar): Use the above.

gcc/testsuite/

* gcc.target/aarch64/xar_neon_modes.c: New test.
* gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
XAR.
* gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.

PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE

simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when
C1 + C2 == mode-width.  But the transformation is also valid for PLUS and XOR.
Indeed GIMPLE can also do the fold.  Let's teach RTL to do it too.

The motivating testcase for this is in AArch64 intrinsics:

uint64x2_t G2(uint64x2_t a, uint64x2_t b) {
    uint64x2_t c = veorq_u64(a, b);
    return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63));
}

which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but
GCC was failing to detect the rotate operation for two reasons:
1) The combination of the two arms of the expression is done under XOR rather
than IOR that simplify-rtx currently supports.
2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not
detected as the LHS of the two arms we require.

The patch fixes both issues.  The analysis of the two arms of the rotation
expression is factored out into a common helper simplify_rotate which is
then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1.

The check-assembly testcase for this is added in the following patch because
it needs some extra AArch64 backend work, but I've added self-tests in this
patch to validate the transformation.

Bootstrapped and tested on aarch64-none-linux-gnu

Signed-off-by: Kyrylo Tkachov <ktachov@nvidia.com>
PR target/117048
* simplify-rtx.cc (extract_ashift_operands_p): Define.
(simplify_rotate_op): Likewise.
(simplify_context::simplify_binary_operation_1): Use the above in
the PLUS, IOR, XOR cases.
(test_vector_rotate): Define.
(test_vector_ops): Use the above.

Daily bump.

docs: Document that __builtin_assoc_barrier also can be used for FMAs [PR115023]

I noticed that __builtin_assoc_barrier makes a differnce for FMAs formation
but it was not documented. This adds that documentation even with a small example.

Build the HTML documents to make sure everything looks correct.

gcc/ChangeLog:

PR middle-end/115023
* doc/extend.texi (__builtin_assoc_barrier): Document ffp-contract=fast
and FMA usage.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

match: Fix `a != 0 ? a - 1 : 0` pattern [PR117363]

There are a couple of things wrong with this pattern which
I missed during the review. First each nop_convert should
be nop_convert1 or nop_convert2.
Second is we need to the minus in the same type as the minus
was originally so we don't introduce extra undefined behavior
(signed integer overflow). And we need a convert into the new
type too.

pr117363-1.c tests not introducing extra undefined behavior.
pr117363-2.c tests the casting to the correct final type, ldist
introduces the cond_expr here.

Bootstraped and tested on x86_64-linux-gnu.

PR tree-optimization/117363

gcc/ChangeLog:

* match.pd (`a != 0 ? a - 1 : 0`): Fix type handling
and nop_convert handling.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117363-1.c: New test.
* gcc.dg/torture/pr117363-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fortran: Fix associate_69.f90 that fails on some platforms [PR115700]

2024-11-03 Paul Thomas <pault@gcc.gnu.org>

gcc/testsuite/
PR fortran/115700
* gfortran.dg/associate_69.f90: Remove the test that produces a
variable string length because the optimized count depends on
the platform. This is tested in associate_70.f90.

Daily bump.

testsuite: Require fpic support for pr116887.c

Test case pr116887.c is passing -fpic, so mark it as such.

With this patch the test is now properly marked as unsupported
for pru-unknown-elf. Test still passes for x86_64-pc-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/pr116887.c: Require effective target fpic.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

testsuite: Require trampoline support for pr117245.c

Test case pr117245.c is using trampolines, so mark it as such.

With this patch the test is now properly marked as unsupported
for pru-unknown-elf. Test still passes for x86_64-pc-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117245.c: Require effective target with trampolines.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

Add UMASKR and UMASKL intrinsics.

gcc/fortran/ChangeLog:

* check.cc (gfc_check_mask): Handle BT_INSIGNED.
* gfortran.h (enum gfc_isym_id): Add GFC_ISYM_UMASKL and
GFC_ISYM_UMASKR.
* gfortran.texi: List UMASKL and UMASKR, remove unsigned future
unsigned arguments for MASKL and MASKR.
* intrinsic.cc (add_functions): Add UMASKL and UMASKR.
* intrinsic.h (gfc_simplify_umaskl): New function.
(gfc_simplify_umaskr): New function.
(gfc_resolve_umasklr): New function.
* intrinsic.texi: Document UMASKL and UMASKR.
* iresolve.cc (gfc_resolve_umasklr): New function.
* simplify.cc (gfc_simplify_umaskr): New function.
(gfc_simplify_umaskl): New function.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_39.f90: New test.

libstdc++: Fix up std::{,b}float16_t std::{ilogb,l{,l}r{ound,int}} [PR117406]

These overloads incorrectly cast the result of the float __builtin_*
to _Float or __gnu_cxx::__bfloat16_t.  For std::ilogb that changes
behavior for the INT_MAX return because that isn't representable in
either of the floating point formats, for the others it is I think
just a very inefficient hop from int/long/long long to std::{,b}float16_t
and back.  I mean for the round/rint cases, either the argument is small
and then the return value should be representable in the floating point
format too, or it is too large that the argument is already integral
and then it should just return the argument with the round trips.
Too large value is unspecified unlike ilogb.

2024-11-02  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117406
* include/c_global/cmath (std::ilogb(_Float16), std::llrint(_Float16),
std::llround(_Float16), std::lrint(_Float16), std::lround(_Float16)):
Don't cast __builtin_* return to _Float16.
(std::ilogb(__gnu_cxx::__bfloat16_t),
std::llrint(__gnu_cxx::__bfloat16_t),
std::llround(__gnu_cxx::__bfloat16_t),
std::lrint(__gnu_cxx::__bfloat16_t),
std::lround(__gnu_cxx::__bfloat16_t)): Don't cast __builtin_* return to
__gnu_cxx::__bfloat16_t.
* testsuite/26_numerics/headers/cmath/117406.cc: New test.

gimplify: Fix up RAW_DATA_CST related ICE [PR117384]

Apparently tree_output_constant_def doesn't strictly guarantee that the
returned VAR_DECL will have the same or uselessly convertible type as
the type of the constant passed to it, compare_constants says:
/* For arrays, check that mode, size and storage order match.  */
/* For record and union constructors, require exact type equality.  */
The older use of tree_output_constant_def in gimplify.cc was already
handling this right:
               ctor = tree_output_constant_def (ctor);
               if (!useless_type_conversion_p (type, TREE_TYPE (ctor)))
                 ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);
but the spot I've added for RAW_DATA_CST missed this.

So, the following patch adds that.

2024-11-02  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117384
* gimplify.cc (gimplify_init_ctor_eval): Add VIEW_CONVERT_EXPR around
rctor if it doesn't have expected type.

* c-c++-common/init-7.c: New test.

c++/modules: Propagate TYPE_CANONICAL for partial specialisations [PR113814]

In some cases, when we go to import a partial specialisation there might
already be an incomplete implicit instantiation in the specialisation
table. This causes ICEs described in the linked PR as we now have two
separate matching specialisations for this same arguments with different
TYPE_CANONICAL.

We already support multiple specialisations with the same args however,
as they may be differently constrained. So we can solve this by simply
ensuring that the TYPE_CANONICAL of the new partial specialisation
matches the existing specialisation.

PR c++/113814

gcc/cp/ChangeLog:

* pt.cc (add_mergeable_specialization): Propagate
TYPE_CANONICAL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-6.h: New test.
* g++.dg/modules/partial-6_a.H: New test.
* g++.dg/modules/partial-6_b.H: New test.
* g++.dg/modules/partial-6_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Co-authored-by: Jason Merrill <jason@redhat.com>

c++/modules: Fix recursive dependencies [PR116317]

In cases like the linked PR we sometimes get mutually recursive
dependencies that both rely on the other to have been streamed as part
of their merge key information.  In the linked PR, this causes an ICE.

The root cause is that 'sort_cluster' is not correctly ordering the
dependencies; both the element_t specialisation and the
reverse_adaptor::first function decl depend on each other, but by
streaming element_t first it ends up trying to stream itself recursively
as part of calculating its own merge key, which apart from the checking
ICE will also cause issues on stream-in, as the merge key will not
properly stream.

There is a comment already in 'sort_cluster' describing this issue, but
it says:

     Finding the single cluster entry dep is very tricky and
     expensive.  Let's just not do that.  It's harmless in this case
     anyway.

However in this case it was not harmless: it's just somewhat luck that
the sorting happened to work for the existing cases in the testsuite.

This patch solves the issue by noting any declarations that rely on deps
first seen within their own merge key.  This declaration gets marked as
an "entry" dep; any of these deps that end up recursively referring back
to that entry dep as part of their own merge key do not.

Then within sort_cluster we can ensure that the entry dep is written to
be streamed first of its cluster; this will ensure that any other deps
are just emitted as back-references, and the mergeable dep itself will
structurally decompose.

PR c++/116317

gcc/cp/ChangeLog:

* module.cc
(depset::DB_MAYBE_RECURSIVE_BIT): New flag.
(depset::DB_ENTRY_BIT): New flag.
(depset::is_maybe_recursive): New accessor.
(depset::is_entry): New accessor.
(depset::hash::writing_merge_key): New field.
(trees_out::decl_value): Inform dep_hash while we're writing the
merge key information for a decl.
(depset::hash::add_dependency): Find recursive deps and mark the
entry point.
(sort_cluster): Ensure that the entry dep is streamed first.

gcc/testsuite/ChangeLog:

* g++.dg/modules/late-ret-4_a.H: New test.
* g++.dg/modules/late-ret-4_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

[committed] ft32 doesn't support trampolines.

The ft32 has never supported trampolines, but the target supports bits weren't
appropriately updated.

Fixed thusly.

gcc/testsuite
* lib/target-supports.exp (check_effective_target_trampolines): ft32
does not support trampolines.

[committed] Make LRA default for ft32 and remove -mlra option

I was looking to clean up an old patch I'm carrying in my tester.  My first
thought was that ft32 was likely going to be deprecated because it wasn't using
LRA -- which in turn would mean the patch in question could just be removed.

But then I checked, ft32 has an LRA option and if turned on it gets the exact
same test results as with reload.  While the port mentions a failure with
sieve.c, that's been there since the port was introduced in 2015.

It's working well enough that I think just converting it is the right thing to
do.  The testsuite patch which precipitated this one will follow separately.
I've kept the -mlra option for compatibility sake, but it's ignored.

Pushing to the trunk.

gcc/
* config/ft32/ft32.cc (ft32_lra_p): Remove.
(TARGET_LRA_P): Likewise.
* config/ft32/ft32.opt: Make -mlra ignored.
* doc/invoke.texi: Adjust documentation for -mlra on ft32.

analyzer: use std::unique_ptr in "to_json" functions

No functional change intended.

gcc/analyzer/ChangeLog:
* analyzer.cc: Include "make-unique.h". Convert "to_json"
functions to use std::unique_ptr.
* call-string.cc: Likewise.
* constraint-manager.cc: Likewise.
* diagnostic-manager.cc: Likewise.
* engine.cc: Likewise.
* program-point.cc: Likewise.
* program-state.cc: Likewise.
* ranges.cc: Likewise.
* region-model.cc: Likewise.
* region.cc: Likewise.
* svalue.cc: Likewise.
* sm.cc: Likewise.
* store.cc: Likewise.
* supergraph.cc: Likewise.
* analyzer.h: Convert "to_json" functions to return
std::unique_ptr.
* call-string.h: Likewise.
* constraint-manager.h: Likewise.
(bounded_range::set_json_attr): Pass "obj" by reference.
* diagnostic-manager.h: Convert "to_json" functions to return
std::unique_ptr.
* exploded-graph.h: Likewise.
* program-point.h: Likewise.
* program-state.h: Likewise.
* ranges.h: Likewise.
* region-model.h: Likewise.
* region.h: Likewise.
* sm.h: Likewise.
* store.h: Likewise.
* supergraph.h: Likewise.
* svalue.h: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Daily bump.

builtins: Fix expand_builtin_prefetch [PR117407]

On Fri, Nov 01, 2024 at 04:47:35PM +0800, Haochen Jiang wrote:
>         * builtins.cc (expand_builtin_prefetch): Use IN_RANGE to
>       avoid second usage of INTVAL.

I doubt this has been actually tested.

> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -1297,7 +1297,7 @@ expand_builtin_prefetch (tree exp)
>    else
>      op1 = expand_normal (arg1);
>    /* Argument 1 must be 0, 1 or 2.  */
> -  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)
> +  if (IN_RANGE (INTVAL (op1), 0, 2))
>      {
>        warning (0, "invalid second argument to %<__builtin_prefetch%>;"
>              " using zero");
> @@ -1315,7 +1315,7 @@ expand_builtin_prefetch (tree exp)
>    else
>      op2 = expand_normal (arg2);
>    /* Argument 2 must be 0, 1, 2, or 3.  */
> -  if (INTVAL (op2) < 0 || INTVAL (op2) > 3)
> +  if (IN_RANGE (INTVAL (op2), 0, 3))
>      {
>        warning (0, "invalid third argument to %<__builtin_prefetch%>; using zero");
>        op2 = const0_rtx;

because it inverts the tests, previously it was warning when op1 wasn't
0, 1, 2, now it warns when it is 0, 1 or 2, previously it was warning
when op2 wasn't 0, 1, 2 or 3, now it warns when it is 0, 1, 2, or 3.

Fixed thusly.

2024-11-01  Jakub Jelinek  <jakub@redhat.com>

PR bootstrap/117407
* builtins.cc (expand_builtin_prefetch): Use !IN_RANGE rather
than IN_RANGE.

libstdc++: Check feature test macros in unordered containers

Replace some `__cplusplus > 201402L` preprocessor checks with more
expressive checks for the appropriate feature test macros.

libstdc++-v3/ChangeLog:

* include/bits/unordered_map.h: Check __glibcxx_node_extract and
__glibcxx_unordered_map_try_emplace instead of __cplusplus.
* include/bits/unordered_set.h: Check __glibcxx_node_extract
instead of __cplusplus.

libstdc++: Minor comment improvements in <bits/hashtable.h>

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Improve comments.

libstdc++: Add P1206R7 from_range members to std::list and std::forward_list [PR111055]

This is another piece of P1206R7, adding new members to std::list and
std::forward_list.

libstdc++-v3/ChangeLog:

PR libstdc++/111055
* include/bits/forward_list.h
(forward_list(from_range, R&&, const Alloc&), assign_range)
(prepend_range, insert_range_after): Define.
* include/bits/stl_list.h (list(from_range, R&&, const Alloc&))
(assign_range, prepend_range, append_range, insert_range):
Define.
* include/debug/forward_list
(forward_list(from_range, R&&, const Alloc&), assign_range)
(prepend_range, insert_range_after): Define.
* include/debug/list (list(from_range, R&&, const Alloc&))
(assign_range, prepend_range, append_range, insert_range):
Define.
* testsuite/23_containers/forward_list/cons/from_range.cc: New
test.
* testsuite/23_containers/forward_list/modifiers/assign_range.cc:
New test.
* testsuite/23_containers/forward_list/modifiers/insert_range_after.cc:
New test.
* testsuite/23_containers/forward_list/modifiers/prepend_range.cc:
New test.
* testsuite/23_containers/list/cons/from_range.cc: New test.
* testsuite/23_containers/list/modifiers/append_range.cc: New
test.
* testsuite/23_containers/list/modifiers/assign/assign_range.cc:
New test.
* testsuite/23_containers/list/modifiers/insert/insert_range.cc:
New test.
* testsuite/23_containers/list/modifiers/prepend_range.cc: New
test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

Update bitwise_or op_range.

If the LHS of a bitwise OR is positive, then so are both operands when
using op1_range or op2_range.

gcc/
* range-op.cc (operator_bitwise_or::op1_range): If LHS is signed
positive, so are both operands.

gcc/testsuite
* g++.dg/cpp23/attr-assume-opt.C (f2b): Alternate flow test.

Reimplement 'assume' processing pass.

Rework the assume pass to work properly and fail conservatively when it
does. Also move it to its own file.

PR tree-optimization/117287
gcc/
* Makefile.in (IBJS): Add tree-assume.o
* gimple-range.cc (assume_query::assume_range_p): Remove.
(assume_query::range_of_expr): Remove.
(assume_query::assume_query): Move to tree-assume.cc.
(assume_query::~assume_query): Remove.
(assume_query::calculate_op): Move to tree-assume.cc.
(assume_query::calculate_phi): Likewise.
(assume_query::check_taken_edge): Remove.
(assume_query::calculate_stmt): Move to tree-assume.cc.
(assume_query::dump): Remove.
* gimple-range.h (class assume_query): Move to tree-assume.cc
* tree-assume.cc: New
* tree-vrp.cc (struct pass_data_assumptions): Move to tree-assume.cc.
(class pass_assumptions): Likewise.
(make_pass_assumptions): Likewise.

gcc/testsuite/
* g++.dg/cpp23/pr117287-attr.C: New.

Make fur_edge accessible.

Move the decl of fur_edge out of the source file into the header file.

* gimple-range-fold.cc (class fur_edge): Relocate from here.
(fur_edge::fur_edge): Also move to:
* gimple-range-fold.h (class fur_edge): Relocate to here.
(fur_edge::fur_edge): Likewise.

c++: Adjust docs and option descriptions for the publishing of C++23

Now that C++23 has been finally published, the following patch attempts
to mention it in the option descriptions and documentation.
Given that it has been published about 1.5 years after being finalized
and has the 14882:2024 document number pair rather than :2023, I wasn't
sure when exactly to use 2023 (as informal name) and when 2024 (as year
of publishing), so I've tried to use 2024 in standards.texi which talks
more formally about the standards and a note that it has been published
in 2024 when it is talked about more informally.
I remember at least one older edition has been published in January too,
but the ISO pages pretend it was published still in December of the previous
year, in this case it doesn't.

2024-11-01 Jakub Jelinek <jakub@redhat.com>

gcc/
* doc/standards.texi (C++ Language): Mention also the 2024
revision and -std=gnu++23 option.
* doc/invoke.texi (-std=): Adjust description of c++23, c++2b,
gnu++23 and gnu++2b now that ISO C++ 14882:2024 is published.
gcc/c-family/
* c.opt (std=c++2b, std=c++23, std=gnu++2b, std=gnu++23): Adjust
description now that ISO C++ 14882:2024 is published.

c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be Macros [PR114461]

This is an attempt to implement the https://wg21.link/p3034r1 paper,
but I'm afraid the wording in the paper is bad for multiple reasons.
I think I understand the intent, that the module name and partition
if any shouldn't come from macros so that they can be scanned for
without preprocessing, but on the other side doesn't want to disable
macro expansion in pp-module altogether, because e.g. the optional
attribute in module-declaration would be nice to come from macros
as which exact attribute is needed might need to be decided based on
preprocessor checks.
The paper added https://eel.is/c++draft/cpp.module#2
which uses partly the wording from https://eel.is/c++draft/cpp.module#1

The first issue I see is that using that "defined as an object-like macro"
from there means IMHO something very different in those 2 paragraphs.
As per https://eel.is/c++draft/cpp.pre#7.sentence-1 preprocessing tokens
in preprocessing directives aren't subject to macro expansion unless
otherwise stated, and so the export and module tokens aren't expanded
and so the requirement that they aren't defined as an object-like macro
makes perfect sense.  The problem with the new paragraph is that
https://eel.is/c++draft/cpp.module#3.sentence-1 says that the rest of
the tokens are macro expanded and after macro expansion none of the
tokens can be defined as an object-like macro, if they would be, they'd
be expanded to that.  So, I think either the wording needs to change
such that not all preprocessing tokens after module are macro expanded,
only those which are after the pp-module-name and if any pp-module-partition
tokens, or all tokens after module are macro expanded but none of the tokens in
pp-module-name and pp-module-partition if any must come from macro
expansion.  The patch below implements it as if the former would be
specified (but see later), so essentially scans the preprocessing tokens
after module without expansion, if the first one is an identifier, it
disables expansion for it and then if followed by . or : expects another
such identifier (again with disabled expansion), but stops after second
: is seen.

Second issue is that while the global-module-fragment start is fine, matches
the syntax of the new paragraph where the pp-tokens[opt] aren't present,
there is also private-module-fragment in the syntax where module is
followed by : private ; and in that case the colon doesn't match the
pp-module-name grammar and appears now to be invalid.  I think the
https://eel.is/c++draft/cpp.module#2
paragraph needs to change so that it allows also that pp-tokens of
a pp-module may also be : pp-tokens[opt] (and in that case, I think
the colon shouldn't come from a macro and private and/or ; can).

Third issue is that there are too many pp-tokens in
https://eel.is/c++draft/cpp.module , one is all the tokens between
module keyword and the semicolon and one is the optional extra tokens
after pp-module-partition (if any, if missing, after pp-module).
Perhaps introducing some other non-terminal would help talking about it?
So in "where the pp-tokens (if any) shall not begin with a ( preprocessing
token" it isn't obvious which pp-tokens it is talking about (my assumption
is the latter) and also whether ( can't appear there just before macro
expansion or also after expansion.  The patch expects only before expansion,
so
#define F ();
export module foo F
would be valid during preprocessing but obviously invalid during
compilation, but
#define foo(n) n;
export module foo (3)
would be invalid already during preprocessing.

The last issue applies only if the first issue is resolved to allow
expansion of tokens after : if first token, or after pp-module-partition
if present or after pp-module-name if present.  When non-preprocessing
scanner sees
export module foo.bar:baz.qux;
it knows nothing can come from preprocessing macros and is ok, but if it
sees
export module foo.bar:baz qux
then it can't know whether it will be
export module foo.bar:baz;
or
export module foo.bar:baz [[]];
or
export module foo.bar:baz.freddy.garply;
because qux could be validly a macro, which expands to ; or [[]];
or .freddy.garply; etc.  So, either the non-preprocessing scanner would
need to note it as possible export of foo.bar:baz* module partitions
and preprocess if it needs to know the details or just compile, or if that
is not ok, the wording would need to rule out that the expansion of (the
second) pp-tokens if any can't start with . or : (colon would be only
problematic if it isn't present in the tokens before it already).
So, if e.g. defining qux above to . whatever is invalid, then the scanner
can rely it sees the whole module name and partition.

The patch below implements what is above described as the first variant
of the first issue resolution, i.e. disables expansion of as many tokens
as could be in the valid module name and module partition syntax, but
as soon as it e.g. sees two adjacent identifiers, the second one can be
macro expanded.  If it is macro expanded though, the expansion can't
start with . or :, and if it expands to nothing, tokens after it (whether
they come from macro expansion or not) can't start with . or :.
So, effectively:
#define SEMI ;
export module SEMI
used to be valid and isn't anymore,
#define FOO bar
export module FOO;
isn't valid,
#define COLON :
export module COLON private;
isn't valid,
#define BAR baz
export module foo.bar:baz.qux.BAR;
isn't valid,
#define BAZ .qux
export module foo BAZ;
isn't valid,
#define FREDDY :garply
export module foo FREDDY;
isn't valid,
while
#define QUX [[]]
export module foo QUX;
or
#define GARPLY private
module : GARPLY;
etc. is.

2024-11-01  Jakub Jelinek  <jakub@redhat.com>

PR c++/114461
libcpp/
* include/cpplib.h: Implement C++26 P3034R1
- Module Declarations Shouldn’t be Macros (or more precisely
its expected intent).
(NO_DOT_COLON): Define.
* internal.h (struct cpp_reader): Add diagnose_dot_colon_from_macro_p
member.
* lex.cc (cpp_maybe_module_directive): For pp-module, if
module keyword is followed by CPP_NAME, ensure all CPP_NAME
tokens possibly matching module name and module partition
syntax aren't expanded and aren't defined as object-like macros.
Verify first token after that doesn't start with open paren.
If the next token after module name/partition is CPP_NAME defined
as macro, set NO_DOT_COLON flag on it.
* macro.cc (cpp_get_token_1): Set
pfile->diagnose_dot_colon_from_macro_p if token to be expanded has
NO_DOT_COLON bit set in flags.  Before returning, if
pfile->diagnose_dot_colon_from_macro_p is true and not returning
CPP_PADDING or CPP_COMMENT and not during macro expansion preparation,
set pfile->diagnose_dot_colon_from_macro_p to false and diagnose
if returning CPP_DOT or CPP_COLON.
gcc/testsuite/
* g++.dg/modules/cpp-7.C: New test.
* g++.dg/modules/cpp-8.C: New test.
* g++.dg/modules/cpp-9.C: New test.
* g++.dg/modules/cpp-10.C: New test.
* g++.dg/modules/cpp-11.C: New test.
* g++.dg/modules/cpp-12.C: New test.
* g++.dg/modules/cpp-13.C: New test.
* g++.dg/modules/cpp-14.C: New test.
* g++.dg/modules/cpp-15.C: New test.
* g++.dg/modules/cpp-16.C: New test.
* g++.dg/modules/cpp-17.C: New test.
* g++.dg/modules/cpp-18.C: New test.
* g++.dg/modules/cpp-19.C: New test.
* g++.dg/modules/cpp-20.C: New test.
* g++.dg/modules/pmp-4.C: New test.
* g++.dg/modules/pmp-5.C: New test.
* g++.dg/modules/pmp-6.C: New test.
* g++.dg/modules/token-6.C: New test.
* g++.dg/modules/token-7.C: New test.
* g++.dg/modules/token-8.C: New test.
* g++.dg/modules/token-9.C: New test.
* g++.dg/modules/token-10.C: New test.
* g++.dg/modules/token-11.C: New test.
* g++.dg/modules/token-12.C: New test.
* g++.dg/modules/token-13.C: New test.
* g++.dg/modules/token-14.C: New test.
* g++.dg/modules/token-15.C: New test.
* g++.dg/modules/token-16.C: New test.
* g++.dg/modules/dir-only-3.C: Expect an error.
* g++.dg/modules/dir-only-4.C: Expect an error.
* g++.dg/modules/dir-only-5.C: New test.
* g++.dg/modules/atom-preamble-2_a.C: In export module malcolm;
replace malcolm with kevin.  Don't define malcolm macro.
* g++.dg/modules/atom-preamble-4.C: Expect an error.
* g++.dg/modules/atom-preamble-5.C: New test.

Use IN_RANGE in prefetch builtin

These are the last minute changes that should apply to MOVRS patch but
disappeared in patch.

Using IN_RANGE will avoid second usage of INTVAL for prefetch check.

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Use IN_RANGE to
avoid second usage of INTVAL.

i386: Do not allow pointer conversion for CMPccXADD intrin under -O0

The pointer conversion to wider type under macro would not consider
whether the higher bit is cleaned or not. It will lead to unexpected
cmp result.

After this change, it will throw an incompatible pointer type error just
like -O2 does currently.

gcc/ChangeLog:

* config/i386/cmpccxaddintrin.h (_cmpccxadd_epi32): Do not do
type conversion for pointer.
(_cmpccxadd_epi64): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cmpccxadd-1b.c: New test.

testsuite: Fix up builtin-prefetch-1.c tests

How can you use "read-shared" as an identifier? It's not allowed by all
C standard versions.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/builtin-prefetch-1.c (rws): Use
"read_shared" instead of "read-shared" as the identifier for
enum value.
* gcc.dg/builtin-prefetch-1.c (rws): Likewise.

LoongArch: testsuite: Add -O for jump-table-annotate.c

Without optimization, GCC does not emit a jump table for the test case.

I'm not sure if the test case has been wrong in the first place or
something has changed in these months...

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/jump-table-annotate.c
(dg-additional-options): Add -O.

c++: Add testcase for now fixed issue [PR101887]

The testcase in PR101887 has been working since the fix for PR104846,
via r12-7599-gac8310dd122172.

This patch simply adds the case to the testsuite.

PR c++/101887

gcc/testsuite/ChangeLog:

* g++.dg/init/delete5.C: Add testcase from PR c++/101887.

Always set SECTION_RELRO for or .data.rel.ro{,.local} [PR116887]

At least two ports (hppa and loongarch) need to set SECTION_RELRO for
.data.rel.ro{,.local} in section_type_flags (PR52999 and PR116887), and
I cannot see a reason not to just set it in the generic code.

With this applied we can also remove the hppa-specific
pa_section_type_flags in a future patch.

gcc/ChangeLog:

PR target/116887
* varasm.cc (default_section_type_flags): Always set
SECTION_RELRO if name is .data.rel.ro{,.local}.

gcc/testsuite/ChangeLog:

PR target/116887
* gcc.dg/pr116887.c: New test.

analyzer: fix -Wunused-parameter warning [PR117373]

gcc/analyzer/ChangeLog:
PR analyzer/117373
* infinite-loop.cc
(infinite_loop_diagnostic::describe_final_event): Fix
-Wunused-parameter warning

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Use LC_ALL=C when running selftests [PR117361]

gcc/ChangeLog:
PR bootstrap/117361
* Makefile.in (GCC_FOR_SELFTESTS): New.

gcc/c/ChangeLog:
PR bootstrap/117361
* Make-lang.in (s-selftest-c): Use GCC_FOR_SELFTESTS.
(selftest-c-gdb): Likewise.
(selftest-c-valgrind): Likewise.

gcc/cp/ChangeLog:
PR bootstrap/117361
* Make-lang.in (s-selftest-c++): Use GCC_FOR_SELFTESTS.
(selftest-c++-gdb): Likewise.
(selftest-c++-valgrind): Likewise.

gcc/rust/ChangeLog:
PR bootstrap/117361
* Make-lang.in (s-selftest-rust): Use GCC_FOR_SELFTESTS.
(selftest-rust-gdb): Likewise.
(selftest-rust-valgrind): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Add missing <vector> header to unordered_set/pr115285.cc test

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_set/pr115285.cc: Include
missing header for std::vector.

libstdc++: Remove stray whitespace in #endif

This isn't nested within another #if group so shouldn't be indented like
this.

libstdc++-v3/ChangeLog:

* libsupc++/typeinfo: Remove whitespace in #endif

Fix -mod(unsigned, unsigned).

gcc/fortran/ChangeLog:

* resolve.cc (resolve_operator): Also handle BT_UNSIGNED.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_38.f90: Add -pedantic and adjust error
message.
* gfortran.dg/unsigned_40.f90: New test.

openmp: Return error_mark_node from tsubst_attribute for errneous varid

We incorrectly accept some invalid declare variant cases as if declare
variant wasn't there, in particular if a function template has some dependent
arguments and variant name lookup fails, because that is during
fn_type_unification with complain=tf_none, it just sets it to error_mark_node
and doesn't complain further, because it doesn't know the substitution failed
(we don't return error_mark_node from tsubst_attribute, just create TREE_LIST
with error_mark_node TREE_PURPOSE).

The following patch fixes it by returning error_mark_node in that case, then
fn_type_unification caller can see it failed and can redo it with explain_p
so that errors are reported.

2024-11-01 Jakub Jelinek <jakub@redhat.com>

* pt.cc (tsubst_attribute): For "omp declare variant base" attribute
if varid is error_mark_node, set val to error_mark_node rather than
creating a TREE_LIST with error_mark_node TREE_PURPOSE.

* g++.dg/gomp/declare-variant-10.C: New test.

Fortran: Fix problems with substring selectors in ASSOCIATE [PR115700]

2024-11-01 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/115700
* resolve.cc (resolve_assoc_var): Extract a substring reference
with missing as well as non-constant start or end.

gcc/testsuite/
PR fortran/115700
* gfortran.dg/associate_69.f90: Activate commented out tests.
* gfortran.dg/associate_70.f90: Test correct functioning of
references in associate_69.f90 tests.

Support Intel AMX-MOVRS

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-MOVRS.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_MOVRS_SET): New.
(OPTION_MASK_ISA2_AMX_MOVRS_UNSET): Ditto.
(ix86_handle_option): Handle -mamx-movrs.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_MOVRS.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-movrs.
* config.gcc: Add amxmovrsintrin.h.
* config/i386/cpuid.h (bit_AMX_MOVRS): New.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Define __AMX_MOVRS__.
* config/i386/i386-isa.def (AMX_MOVRS): Add DEF_PTA(AMX_MOVRS).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-movrs.
* config/i386/i386.opt: Add option -mamx-movrs.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxmovrsintrin.h
* doc/extend.texi: Document amx-movrs.
* doc/invoke.texi: Document -mamx-movrs.
* doc/sourcebuild.texi: Document target amx-movrs.
* config/i386/amxmovrsintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-movrs.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add new check for amx-movrs.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-movrs.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-movrs.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_movrs): New.
* gcc.target/i386/amxmovrs-asmatt-1.c: New test.
* gcc.target/i386/amxmovrs-asmintel-1.c: Ditto.
* gcc.target/i386/amxmovrs-t2rpntlvw-2.c: Ditto.
* gcc.target/i386/amxmovrs-tileloaddrs-2.c: Ditto.

Support Intel MOVRS

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Expand for
prefetchrst2.
* common/config/i386/cpuinfo.h (get_available_features): Detect movrs.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_MOVRS_SET): New.
(OPTION_MASK_ISA2_MOVRS_UNSET): Ditto.
(ix86_handle_option): Handle -mmovrs.
* common/config/i386/i386-cpuinfo.h
(enum processor_features): Add FEATURE_MOVRS.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for movrs.
* config.gcc: Add movrsintrin.h
* config/i386/cpuid.h (bit_MOVRS): New.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (CHAR, PCCHAR), (SHORT, PCSHORT), (INT, PCINT),
(INT64, PCINT64).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add
__MOVRS__.
* config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Define
__MOVRS__.
* config/i386/i386-isa.def (MOVRS): Add DEF_PTA(MOVRS)
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle movrs.
* config/i386/i386.md (movrs<mode>): New.
* config/i386/i386.opt: Add option -mmovrs.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include movrsintrin.h
* config/i386/sse.md (unspecv): Add UNSPEC_VMOVRS.
(VI1248_AVX10_2): New.
(avx10_2_movrs_vmovrs<ssemodesuffix><mode><mask_name>): New define_insn.
* config/i386/xmmintrin.h: Add prefetchrst2.
* doc/extend.texi: Document movrs.
* doc/invoke.texi: Document -mmovrs.
* doc/rtl.texi: Document extension of prefetchrst2.
* doc/sourcebuild.texi: Document target movrs.
* config/i386/movrsintrin.h: New.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mmovrs.
* g++.dg/other/i386-3.C: Ditto.
* gcc.c-torture/execute/builtin-prefetch-1.c: Expand rws.
* gcc.dg/builtin-prefetch-1.c: Ditto.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mmovrs.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add movrs.
* gcc.target/i386/sse-23.c: Ditto
* gcc.target/i386/avx10_2-512-movrs-1.c: New test.
* gcc.target/i386/avx10_2-movrs-1.c: Ditto.
* gcc.target/i386/movrs-1.c: Ditto.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>

Support Intel AMX-FP8

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_available_features): Detect amx-fp8.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_FP8_SET): New macros.
(OPTION_MASK_ISA2_AMX_FP8_UNSET): Ditto.
(ix86_handle_option): Handle -mamx-fp8.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_FP8.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-fp8.
* config.gcc: Add amxfp8intrin.h.
* config/i386/cpuid.h (bit_AMX_FP8): New.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Define __AMX_FP8__.
* config/i386/i386-isa.def (AMX_FP8): Add DEF_PTA for AMX_FP8.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Add new ATTR.
* config/i386/i386.opt: Add -mamx-fp8.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxfp8intrin.h.
* doc/extend.texi: Document -mamx-fp8.
* doc/invoke.texi: Document -mamx-fp8.
* doc/sourcebuild.texi: Document -mamx-fp8.
* config/i386/amxfp8intrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-fp8.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Check for amx-fp8.
* gcc.target/i386/amx-helper.h: Ditto.
* gcc.target/i386/fp8-helper.h: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-fp8.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: New proc.
* gcc.target/i386/amxfp8-asmatt-1.c: New test.
* gcc.target/i386/amxfp8-asmintel-1.c: Ditto.
* gcc.target/i386/amxfp8-dpbf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dpbhf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dphbf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dphf8ps-2.c: Ditto.
* gcc.target/i386/fp-emulation.h: Emulates NaN behaviour.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

Support Intel AMX-TRANSPOSE

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-TRANSPOSE.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_TRANSPOSE_SET,
OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET): New.
(ix86_handle_option): Handle -mamx-transpose.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_TRANSPOSE.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-transpose.
* config.gcc: Add amxtransposeintrin.h.
* config/i386/cpuid.h (bit_AMX_TRANSPOSE): New.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AMX_TRANSPOSE__.
* config/i386/i386-isa.def (AMX_TRANSPOSE): Add
DEF_PTA(AMX_TRANSPOSE).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-transpose.
* config/i386/i386.opt: Add option -mamx-transpose.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxtransposeintrin.h.
* doc/extend.texi: Document amx-transpose.
* doc/invoke.texi: Document -mamx-transpose.
* doc/sourcebuild.texi: Document target amx-transpose.
* config/i386/amxtransposeintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-transpose.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add new check for amx-transpose.
(__tilepair): New.
(zero_pair_tile_src): New.
(check_pair_tile_register): New.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/amx-helper.h: Add amx-transpose support.
(init_pair_tile_src): New function.
* gcc.target/i386/sse-12.c: Add -mamx-tranpose.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-transpose.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_transposed): New.
* gcc.target/i386/amxtranspose-asmatt-1.c: New test.
* gcc.target/i386/amxtranspose-asmintel-1.c: Ditto.
* gcc.target/i386/amxtranspose-2rpntlvw-2.c: Ditto.
* gcc.target/i386/amxtranspose-conjtcmmimfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-conjtfp16-2.c: Ditto.
* gcc.target/i386/amxtranspose-tcmmimfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tcmmrlfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tdpbf16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tdpfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tmmultf32ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-transposed-2.c: Ditto.