]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
5 months agolibiberty: Fix build with GCC < 7
Jakub Jelinek [Tue, 5 Dec 2023 22:32:19 +0000 (23:32 +0100)] 
libiberty: Fix build with GCC < 7

Tobias reported on IRC that the linker fails to build with GCC 4.8.5.
In configure I've tried to use everything actually used in the sha1.c
x86 hw implementation, but unfortunately I forgot about implicit function
declarations.  GCC before 7 did have <cpuid.h> header and bit_SHA define
and __get_cpuid function defined inline, but it didn't define
__get_cpuid_count, which compiled fine (and the configure test is
intentionally compile time only) due to implicit function declaration,
but then failed to link when linking the linker, because
__get_cpuid_count wasn't defined anywhere.

The following patch fixes that by using what autoconf uses in AC_CHECK_DECL
to make sure the functions are declared.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and
__get_cpuid_count are not implicitly declared.
* configure: Regenerated.

5 months agobtf: avoid wrong DATASEC entries for extern vars [PR112849]
David Faust [Mon, 4 Dec 2023 22:08:03 +0000 (14:08 -0800)] 
btf: avoid wrong DATASEC entries for extern vars [PR112849]

The process of creating BTF_KIND_DATASEC records involves iterating
through variable declarations, determining which section they will be
placed in, and creating an entry in the appropriate DATASEC record
accordingly.

For variables without e.g. an explicit __attribute__((section)), we use
categorize_decl_for_section () to identify the appropriate named section
and corresponding BTF_KIND_DATASEC record.

This was incorrectly being done for 'extern' variable declarations as
well as non-extern ones, which meant that extern variable declarations
could result in BTF_KIND_DATASEC entries claiming the variable is
allocated in some section such as '.bss' without any knowledge whether
that is actually true. That resulted in errors building the Linux kernel
BPF selftests.

This patch corrects btf_collect_datasec () to avoid assuming a section
for extern variables, and only emit BTF_KIND_DATASEC entries for them if
they have a known section.

gcc/
PR debug/112849
* btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
entry in a BTF_KIND_DATASEC record for extern variable decls without
a known section.

gcc/testsuite/
PR debug/112849
* gcc.dg/debug/btf/btf-datasec-3.c: New test.

5 months agolibgfortran: Fix -Wincompatible-pointer-types errors
Jakub Jelinek [Tue, 5 Dec 2023 21:56:41 +0000 (22:56 +0100)] 
libgfortran: Fix -Wincompatible-pointer-types errors

As reported, libgfortran fails to build on targets where int32_t and int
are different types, because it uses int vs. GFC_INTEGER_4 (under hood
int32_t) interchangeably.

The following patch fixes that.

2023-12-05  Florian Weimer  <fweimer@redhat.com>
    Jakub Jelinek  <jakub@redhat.com>

* io/list_read.c (list_formatted_read_scalar) <case BT_CLASS>:
Change types of unit and noiostat to GFC_INTEGER_4 from int, change
type of child_iostat from to GFC_INTEGER_4 * from int *, formatting
fixes.
(nml_read_obj): Likewise.
* io/write.c (list_formatted_write_scalar) <case BT_CLASS>: Likewise.
(nml_write_obj): Likewise.
* io/transfer.c (unformatted_read, unformatted_write): Likewise.

5 months agoc++: Further #pragma GCC unroll C++ fix [PR112795]
Jakub Jelinek [Tue, 5 Dec 2023 21:54:08 +0000 (22:54 +0100)] 
c++: Further #pragma GCC unroll C++ fix [PR112795]

When committing the #pragma GCC unroll patch, I found I forgot one spot
for diagnosting the invalid unrolls - if #pragma GCC unroll argument is
dependent and the pragma is before a range for loop, the unroll tree (now,
before one converted form ushort) is saved into RANGE_FOR_UNROLL and
tsubst_stmt was RECURing on it, but didn't diagnose if it was invalid and
so we ICEd later in the middle-end when  ANNOTATE_EXPR had unexpected
argument.

The following patch fixes that.  So that the diagnostics isn't done in 3
different places, the patch introduces a new function that both
cp_parser_pragma_unroll and instantiation of ANNOTATE_EXPR and RANGE_FOR_STMT
can use.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR c++/112795
* cp-tree.h (cp_check_pragma_unroll): Declare.
* semantics.cc (cp_check_pragma_unroll): New function.
* parser.cc (cp_parser_pragma_unroll): Use cp_check_pragma_unroll.
* pt.cc (tsubst_expr) <case ANNOTATE_EXPR>: Likewise.
(tsubst_stmt) <case RANGE_FOR_STMT>: Likwsie.

* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
-std=gnu++98.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/ext/unroll-7.C: New test.
* g++.dg/ext/unroll-8.C: New test.

5 months agors6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]
Jakub Jelinek [Tue, 5 Dec 2023 20:39:31 +0000 (21:39 +0100)] 
rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

The middle-end has been changed quite recently to canonicalize
-abs (x) to copysign (x, -1) rather than the other way around.
While I agree with that at GIMPLE level, since it matches the GIMPLE
goal of as few operations as possible for a canonical form (-abs (x)
is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
I don't really like that being done on RTL as well (or at least
not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
because on most targets most of floating point constants need to be loaded
from memory, there are a few exceptions but -1 is often not one of them.

Anyway, the following patch fixes the rs6000 regression caused by the
change in GIMPLE canonicalization (i.e. the desirable one).  As rs6000
clearly prefers -abs (x) form because it has a single instruction to do
that while it also has copysign instruction, but that requires loading the
-1 from memory, the following patch just ensures the copysign expander
can actually see the floating point constant and in that case emits the
-abs (x) code (or in the hypothetical case of copysign with non-negative
constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
and does what it did before.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR target/112606
* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
of the last argument from gpc_reg_operand to any_operand.  If
operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
its sign, otherwise if it doesn't satisfy gpc_reg_operand,
force it to REG using copy_to_mode_reg.

5 months agoFortran: allow RESTRICT qualifier also for optional arguments [PR100988]
Harald Anlauf [Mon, 4 Dec 2023 21:44:53 +0000 (22:44 +0100)] 
Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]

gcc/fortran/ChangeLog:

PR fortran/100988
* gfortran.h (IS_PROC_POINTER): New macro.
* trans-types.cc (gfc_sym_type): Use macro in determination if the
restrict qualifier can be used for a dummy variable.  Fix logic to
allow the restrict qualifier also for optional arguments, and to
not apply it to pointer or proc_pointer arguments.

gcc/testsuite/ChangeLog:

PR fortran/100988
* gfortran.dg/coarray_poly_6.f90: Adjust pattern.
* gfortran.dg/coarray_poly_7.f90: Likewise.
* gfortran.dg/coarray_poly_8.f90: Likewise.
* gfortran.dg/missing_optional_dummy_6a.f90: Likewise.
* gfortran.dg/pr100988.f90: New test.

Co-authored-by: Tobias Burnus <tobias@codesourcery.com>
5 months agoRestore build with GCC 4.8 to GCC 5
Richard Sandiford [Tue, 5 Dec 2023 17:53:50 +0000 (17:53 +0000)] 
Restore build with GCC 4.8 to GCC 5

GCC 5 and earlier applied array-to-pointer decay too early,
which affected the new attribute namespace code.  A reduced
example of the construct that the attribute code uses is:

    struct S { template<__SIZE_TYPE__ N> S(int (&)[N]); };
    struct T { int a; S b; };
    int a[] = { 1 };
    T t = { 1, a };

This was fixed by f85e1317f8ea933f5c615680353bd646f480f7d3
(PR 16333 et al).

This patch tries to add a minimally-invasive workaround.

gcc/ada/
* gcc-interface/utils.cc (gnat_internal_attribute_table): Add extra
braces to work around PR 16333 in older compilers.

gcc/
* attribs.cc (handle_ignored_attributes_option): Add extra
braces to work around PR 16333 in older compilers.
* config/aarch64/aarch64.cc (aarch64_gnu_attribute_table): Likewise.
(aarch64_arm_attribute_table): Likewise.
* config/arm/arm.cc (arm_gnu_attribute_table): Likewise.
* config/i386/i386-options.cc (ix86_gnu_attribute_table): Likewise.
* config/ia64/ia64.cc (ia64_gnu_attribute_table): Likewise.
* config/rs6000/rs6000.cc (rs6000_gnu_attribute_table): Likewise.
* target-def.h (TARGET_GNU_ATTRIBUTES): Likewise.
* genhooks.cc (emit_init_macros): Likewise, when emitting the
instantiation of TARGET_ATTRIBUTE_TABLE.
* langhooks-def.h (LANG_HOOKS_INITIALIZER): Likewise, when
instantiating LANG_HOOKS_ATTRIBUTE_TABLE.
(LANG_HOOKS_ATTRIBUTE_TABLE): Define to be empty by default.
* target.def (attribute_table): Likewise.

gcc/c-family/
* c-attribs.cc (c_common_gnu_attribute_table): Add extra
braces to work around PR 16333 in older compilers.

gcc/c/
* c-decl.cc (std_attribute_table): Add extra braces to work
around PR 16333 in older compilers.

gcc/cp/
* tree.cc (cxx_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.

gcc/d/
* d-attribs.cc (d_langhook_common_attribute_table): Add extra braces
to work around PR 16333 in older compilers.
(d_langhook_gnu_attribute_table): Likewise.

gcc/fortran/
* f95-lang.cc (gfc_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.

gcc/jit/
* dummy-frontend.cc (jit_gnu_attribute_table): Add extra braces
to work around PR 16333 in older compilers.
(jit_format_attribute_table): Likewise.

gcc/lto/
* lto-lang.cc (lto_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.
(lto_format_attribute_table): Likewise.

5 months agolibstdc++: Disable std::formatter::set_debug_format [PR112832]
Jonathan Wakely [Mon, 4 Dec 2023 12:03:28 +0000 (12:03 +0000)] 
libstdc++: Disable std::formatter::set_debug_format [PR112832]

All set_debug_format member functions should be guarded by the
__cpp_lib_formatting_ranges macro (which is not defined yet).

libstdc++-v3/ChangeLog:

PR libstdc++/112832
* include/std/format (formatter::set_debug_format): Ensure this
member is defined conditionally for all specializations.
* testsuite/std/format/formatter/112832.cc: New test.

5 months agolibstdc++: Add test for LWG Issue 3897
Will Hawkins [Mon, 4 Dec 2023 20:59:44 +0000 (20:59 +0000)] 
libstdc++: Add test for LWG Issue 3897

Add a test to verify that the implementation of inout_ptr is not
vulnerable to LWG Issue 3897.

libstdc++-v3/ChangeLog:

* testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: Add check
for LWG Issue 3897.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
5 months agoc++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]
Jakub Jelinek [Tue, 5 Dec 2023 16:38:46 +0000 (17:38 +0100)] 
c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]

Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration
and the change was voted in as a DR.

The following patch implements it by parsing the attributes and warning
about them.

I found one attribute parsing bug I'll send a fix for momentarily.

And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?  cp_parser_declaration at least tries:
      if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
        warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
                    OPT_Wattributes, "attribute ignored");
but attribute_ignored_p here checks the first attribute rather than the
whole chain.  So it will incorrectly not warn if there is an ignored
attribute followed by non-ignored.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR c++/110734
* parser.cc (cp_parser_block_declaration): Implement C++ DR 2262
- Attributes for asm-definition.  Call cp_parser_asm_definition
even if RID_ASM token is only seen after sequence of standard
attributes.
(cp_parser_asm_definition): Parse standard attributes before
RID_ASM token and warn for them with -Wattributes.

* g++.dg/DRs/dr2262.C: New test.
* g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors
on attributes on asm definitions.
* g++.dg/gomp/attrs-11.C: Remove 2 expected errors.

5 months agomiddle-end/112860 - -fgimple can skip ISEL
Richard Biener [Tue, 5 Dec 2023 13:24:34 +0000 (14:24 +0100)] 
middle-end/112860 - -fgimple can skip ISEL

The following makes sure we don't skip ISEL.

PR middle-end/112860
* passes.cc (should_skip_pass_p): Do not skip ISEL.

5 months agoPR modula2/112865 IM and RE fails to skip type equivalences
Gaius Mulley [Tue, 5 Dec 2023 14:54:00 +0000 (14:54 +0000)] 
PR modula2/112865 IM and RE fails to skip type equivalences

This patch skip type equivalences when checking IM and RE
ISO M2 standard functions for complex data type operands.

gcc/m2/ChangeLog:

PR modula2/112865
* gm2-compiler/M2Quads.mod (BuildReFunction): Use
GetDType to retrieve the type of the operand when
converting the complex type to its scalar equivalent.
(BuildImFunction): Use GetDType to retrieve the type of the
operand when converting the complex type to its scalar
equivalent.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
5 months agosanitizer/111736 - skip ASAN for globals in alternate address-space
Richard Biener [Tue, 5 Dec 2023 13:00:43 +0000 (14:00 +0100)] 
sanitizer/111736 - skip ASAN for globals in alternate address-space

PR sanitizer/111736
* asan.cc (asan_protect_global): Do not protect globals
in non-generic address-space.

5 months agoipa/92606 - IPA ICF merging variables in different address-space
Richard Biener [Tue, 5 Dec 2023 12:56:10 +0000 (13:56 +0100)] 
ipa/92606 - IPA ICF merging variables in different address-space

The following aovids merging variables that are put in different
address-spaces.

PR ipa/92606
* ipa-icf.cc (sem_variable::equals_wpa): Compare address-spaces.

5 months agomiddle-end/112830 - avoid gimplifying non-default addr-space assign to memcpy
Richard Biener [Mon, 4 Dec 2023 09:35:38 +0000 (10:35 +0100)] 
middle-end/112830 - avoid gimplifying non-default addr-space assign to memcpy

The following avoids turning aggregate copy involving non-default
address-spaces to memcpy since that is not prepared for that.

GIMPLE verification no longer accepts WITH_SIZE_EXPR in aggregate
copies, the following re-allows that for the RHS.  I also needed
to adjust one assert in DCE.

get_memory_address is used for string builtin expansion, so instead
of fixing that up for non-generic address-spaces I've put an assert
there.

I'll note that the same issue exists for initialization from an
empty CTOR which we gimplify to a memset call but since we are
not prepared to handle RTL expansion of the original VLA init and
I failed to provide test coverage (without extending the GNU C
extension for VLA structs) and the Ada frontend (or other frontends)
to not have address-space support the patch instead asserts we only
see generic address-spaces there.

PR middle-end/112830
* gimplify.cc (gimplify_modify_expr): Avoid turning aggregate
copy of non-generic address-spaces to memcpy.
(gimplify_modify_expr_to_memcpy): Assert we are dealing with
a copy inside the generic address-space.
(gimplify_modify_expr_to_memset): Likewise.
* tree-cfg.cc (verify_gimple_assign_single): Allow
WITH_SIZE_EXPR as part of the RHS of an assignment.
* builtins.cc (get_memory_address): Assert we are dealing
with the generic address-space.
* tree-ssa-dce.cc (ref_may_be_aliased): Handle WITH_SIZE_EXPR.

* gcc.target/avr/pr112830.c: New testcase.
* gcc.target/i386/pr112830.c: Likewise.

5 months agotree-optimization/112856 - fix LC SSA after loop header copying
Richard Biener [Tue, 5 Dec 2023 07:50:57 +0000 (08:50 +0100)] 
tree-optimization/112856 - fix LC SSA after loop header copying

When loop header copying unloops loops we have to possibly fixup
LC SSA.  I've take the opportunity to streamline the unloop_loops
API, removing the use of a ivcanon local global variable.

PR tree-optimization/109689
PR tree-optimization/112856
* cfgloopmanip.h (unloop_loops): Adjust API.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Take edges_to_remove
as parameter.
(canonicalize_induction_variables): Adjust.
(tree_unroll_loops_completely): Likewise.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Rewrite into
LC SSA if we unlooped some loops and we are in LC SSA.

* gcc.dg/torture/pr109689.c: New testcase.
* gcc.dg/torture/pr112856.c: Likewise.

5 months agoi386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
Jakub Jelinek [Tue, 5 Dec 2023 12:17:57 +0000 (13:17 +0100)] 
i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]

The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.

The following patch fixes that.  As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.

5 months agoaarch64: Add support for SME2 intrinsics
Richard Sandiford [Tue, 5 Dec 2023 10:24:02 +0000 (10:24 +0000)] 
aarch64: Add support for SME2 intrinsics

This patch adds support for the SME2 <arm_sme.h> intrinsics.  The
convention I've used is to put stuff in aarch64-sve-builtins-sme.*
if it relates to ZA, ZT0, the streaming vector length, or other
such SME state.  Things that operate purely on predicates and
vectors go in aarch64-sve-builtins-sve2.* instead.  Some of these
will later be picked up for SVE2p1.

We previously used Uph internally as a constraint for 16-bit
immediates to atomic instructions.  However, we need a user-facing
constraint for the upper predicate registers (already available as
PR_HI_REGS), and Uph makes a natural pair with the existing Upl.

gcc/
* config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro.
(P_ALIASES): Likewise.
(REGISTER_NAMES): Add pn aliases of the predicate registers.
(W8_W11_REGNUM_P): New macro.
(W8_W11_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
* config/aarch64/aarch64.cc (aarch64_print_operand): Add support
for %K, which prints a predicate as a counter.  Handle tuples of
predicates.
(aarch64_regno_regclass): Handle W8_W11_REGS.
(aarch64_class_max_nregs): Likewise.
* config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints.
(x, y): Move further up file.
(Uph): Redefine as the high predicate registers, renaming the old
constraint to...
(Uih): ...this.
* config/aarch64/predicates.md (const_0_to_7_operand): New predicate.
(const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise.
(const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise.
(aarch64_simd_shift_imm_qi): Use const_0_to_7_operand.
* config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY)
(VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4)
(SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24)
(SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24)
(SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24)
(SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators.
(UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs.
(UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN)
(UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN)
(UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB)
(UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise.
(UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise.
(UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise.
(UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT)
(UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ)
(UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS)
(UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise.
(UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise.
(UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise.
(UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise.
(Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv)
(VSINGLE, vsingle, b): Add tuple modes.
(v2xwide, za32_offset_range, za64_offset_range, za32_long)
(za32_last_offset, vg_modifier, z_suffix, aligned_operand)
(aligned_fpr): New mode attributes.
(SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI)
(SVE_FP_BINARY_MULTI): New int iterators.
(SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
(SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN)
(SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ)
(UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI)
(SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD)
(SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE)
(SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS)
(LUTI_BITS): New int iterators.
(optab, sve_int_op): Handle the new unspecs.
(sme_int_op, has_16bit_form): New int attributes.
(bits_etype): Handle 64.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
* config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi
rather than Uph for HImode immediates.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): New patterns.
(@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to...
(@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>)
(@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>):
...these new patterns.
(SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants.  Add
SVE_WHILE_B to existing while patterns.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>)
(@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2)
(@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus)
(@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2)
(<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>)
(@aarch64_sve_<sve_int_op><mode>): New patterns.
(@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>)
(*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>)
(@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x)
(@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2)
(@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns.
(@aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(*aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise.
(aarch64_sve_fdotvnx4sfvnx8hf): Likewise.
(aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise.
(@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise.
(truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise.
(<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise.
(@aarch64_sve_sel<mode>): Likewise.
(@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise.
(@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise.
(@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise.
(@aarch64_sve_<optab><mode>): Likewise.
* config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>)
(*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>)
(*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns.
(*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise.
(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
(@aarch64_sme_single_<optab><mode>): Likewise.
(*aarch64_sme_single_<optab><mode>_plus): Likewise.
(@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>)
(*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(@aarch64_sme_lut<LUTI_BITS><mode>): Likewise.
(UNSPEC_SME_LUTI): New unspec.
* config/aarch64/aarch64-sve-builtins.def (single): New mode suffix.
(c8, c16, c32, c64): New type suffixes.
(vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2)
(vg4x4): New group suffixes.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0)
(CP_WRITE_ZT0): New constants.
(get_svbool_t): Delete.
(function_resolver::report_mismatched_num_vectors): New member
function.
(function_resolver::resolve_conversion): Likewise.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_nonscalar_type): Likewise.
(function_resolver::finish_opt_single_resolution): Likewise.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter.
(function_expander::map_to_rtx_codes): Add an extra parameter
for unconditional FP unspecs.
(function_instance::gp_type_index): New member function.
(function_instance::gp_type): Likewise.
(function_instance::gp_mode): Handle multi-vector operations.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count)
(TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen)
(TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2)
(TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4)
(TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu)
(TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer)
(TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned)
(TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type
macros.
(groups_x2, groups_x12, groups_x4, groups_x24, groups_x124)
(groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4)
(groups_vg24): New group arrays.
(function_instance::reads_global_state_p): Handle CP_READ_ZT0.
(function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0.
(add_shared_state_attribute): Handle zt0 state.
(function_builder::add_overloaded_functions): Skip MODE_single
for non-tuple groups.
(function_resolver::report_mismatched_num_vectors): New function.
(function_resolver::resolve_to): Add a fallback error message for
the general two-type case.
(function_resolver::resolve_conversion): New function.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_matching_vector_type): Specifically
diagnose mismatched vector counts.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter.  Extend to handle cases where
tuples are expected.
(function_resolver::require_nonscalar_type): New function.
(function_resolver::check_gp_argument): Use gp_type_index rather
than hard-coding VECTOR_TYPE_svbool_t.
(function_resolver::finish_opt_single_resolution): New function.
(function_checker::require_immediate_either_or): Remove hard-coded
constants.
(function_expander::direct_optab_handler): New function.
(function_expander::use_pred_x_insn): Only add a strictness flag
is the insn has an operand for it.
(function_expander::map_to_rtx_codes): Take an unconditional
FP unspec as an extra parameter.  Handle tuples and MODE_single.
(function_expander::map_to_unspecs): Handle tuples and MODE_single.
* config/aarch64/aarch64-sve-builtins-functions.h (read_zt0)
(write_zt0): New typedefs.
(full_width_access::memory_vector): Use the function's
vectors_per_tuple.
(rtx_code_function_base): Add an optional unconditional FP unspec.
(rtx_code_function::expand): Update accordingly.
(rtx_code_function_rotated::expand): Likewise.
(unspec_based_function_exact_insn::expand): Use tuple_mode instead
of vector_mode.
(unspec_based_uncond_function): New typedef.
(cond_or_uncond_unspec_function): New class.
(sme_1mode_function::expand): Handle single forms.
(sme_2mode_function_t): Likewise, adding a template parameter for them.
(sme_2mode_function): Update accordingly.
(sme_2mode_lane_function): New typedef.
(multireg_permute): New class.
(class integer_conversion): Likewise.
(while_comparison::expand): Handle svcount_t and svboolx2_t results.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp, compare_scalar_count, count_pred_c)
(dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane)
(extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice)
(select_pred, shift_right_imm_narrowxn, storexn, str_zt)
(unary_convertxn, unary_za_slice, unaryxn, write_za)
(write_za_slice): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(za_group_is_pure_overload): New function.
(apply_predication): Use the function's gp_type for the predicate,
instead of hard-coding the use of svbool_t.
(parse_element_type): Add support for "c" (svcount_t).
(parse_type): Add support for "c0" and "c1" (conversion destination
and source types).
(binary_za_slice_lane_base): New class.
(binary_za_slice_opt_single_base): Likewise.
(load_contiguous_base::resolve): Pass the group suffix to r.resolve.
(luti_lane_zt_base): New class.
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp): New shapes.
(compare_scalar_def::build): Allow the return type to be a tuple.
(compare_scalar_def::expand): Pass the group suffix to r.resolve.
(compare_scalar_count, count_pred_c, dot_za_slice_int_lane)
(dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt)
(ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn)
(storexn, str_zt): New shapes.
(ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with...
(ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these
new classes.  Allow a second suffix that specifies the type of the
second vector argument, and that is used to derive the third.
(unary_def::build): Extend to handle tuple types.
(unary_convert_def::build): Use the new c0 and c1 format specifiers.
(unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes.
(write_za_slice): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand)
(svext_bhw_impl::expand): Update call to map_to_rtx_costs.
(svcntp_impl::expand): Handle svcount_t variants.
(svcvt_impl::expand): Handle unpredicated conversions separately,
dealing with tuples.
(svdot_impl::expand): Handle 2-way dot products.
(svdotprod_lane_impl::expand): Likewise.
(svld1_impl::fold): Punt on tuple loads.
(svld1_impl::expand): Handle tuple loads.
(svldnt1_impl::expand): Likewise.
(svpfalse_impl::fold): Punt on svcount_t forms.
(svptrue_impl::fold): Likewise.
(svptrue_impl::expand): Handle svcount_t forms.
(svrint_impl): New class.
(svsel_impl::fold): Punt on tuple forms.
(svsel_impl::expand): Handle tuple forms.
(svst1_impl::fold): Punt on tuple loads.
(svst1_impl::expand): Handle tuple loads.
(svstnt1_impl::expand): Likewise.
(svwhilelx_impl::fold): Punt on tuple forms.
(svdot_lane): Use UNSPEC_FDOT.
(svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs.
(rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl.
* config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2)
(svset2, svundef2): Add _b variants.
(svcvt): Use unary_convertxn.
(svdot): Use ternary_qq_opt_n_or_011.
(svdot_lane): Use ternary_qq_or_011_lane.
(svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n.
(svpfalse): Add a form that returns svcount_t results.
(svrinta, svrintm, svrintn, svrintp): Use unaryxn.
(svsel): Use binaryxn.
(svst1, svstnt1): Use storexn.
* config/aarch64/aarch64-sve-builtins-sme.h
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): Declare.
* config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base):
Rename to...
(load_store_za_zt0_base): ...this and extend to tuples.
(load_za_base, store_za_base): Update accordingly.
(expand_ldr_str_zt0): New function.
(svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl)
(svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes.
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): New functions.
* config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl)
(svcvtn_impl, svpext_impl, svpsel_impl): New classes.
(svqrshl_impl::fold): Update for change to svrshl shape.
(svrshl_impl::fold): Punt on tuple forms.
(svsqadd_impl::expand): Update call to map_to_rtx_codes.
(svunpk_impl): New class.
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): New functions.
* config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine __ARM_FEATURE_SME2.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way
for test functions to share ZT0.
(ATTR): Update accordingly.
(TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN)
(TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C)
(TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE)
(TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW)
(TEST_X4_NARROW): New macros.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove
test for svmopa that becomes valid with SME2.
* gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for
existence of svboolx2_t version of svcreate2.
* gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error
messages to account for svcount_t predication.
* gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust
error messages to account for new SME2 variants.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.

5 months agoaarch64: Add ZT0
Richard Sandiford [Tue, 5 Dec 2023 10:24:01 +0000 (10:24 +0000)] 
aarch64: Add ZT0

SME2 adds a 512-bit lookup table called ZT0.  It is enabled
and disabled by PSTATE.ZA, just like ZA itself.  This patch
adds support for the register, including saving and restoring
contents.

The code reuses the V8DI that was added for LS64, including
the associated memory classification rules.  (The ZT0 range
is more restricted than the LS64 range, but that's enforced
by predicates and constraints.)

gcc/
* config/aarch64/aarch64.md (ZT0_REGNUM): New constant.
(LAST_FAKE_REGNUM): Bump to include it.
* config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0.
(CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(machine_function): Add zt0_save_buffer.
(CUMULATIVE_ARGS): Add shared_zt0_flags;
* config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0.
(aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise.
(aarch64_function_arg): Add the shared ZT0 flags as an extra
limb of the parallel.
(aarch64_init_cumulative_args): Initialize shared_zt0_flags.
(aarch64_extra_live_on_entry): Handle ZT0_REGNUM.
(aarch64_epilogue_uses): Likewise.
(aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions.
(aarch64_restore_zt0): Likewise.
(aarch64_start_call_args): Reject calls to functions that share
ZT0 from functions that have no ZT0 state.  Save ZT0 around shared-ZA
calls that do not share ZT0.
(aarch64_expand_call): Handle ZT0.  Reject calls to functions that
share ZT0 but not ZA from functions with ZA state.
(aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions
that do not share ZT0.
(aarch64_set_current_function): Require +sme2 for functions that
have ZT0 state.
(aarch64_function_attribute_inlinable_p): Don't allow functions to
be inlined if they have local zt0 state.
(AARCH64_IPA_CLOBBERS_ZT0): New constant.
(aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0.
(aarch64_can_inline_p): Don't inline callees that clobber ZT0
into functions that have ZT0 state.
(aarch64_comp_type_attributes): Check for compatible ZT0 sharing.
(aarch64_optimize_mode_switching): Use mode switching if the
function has ZT0 state.
(aarch64_mode_emit_local_sme_state): Save and restore ZT0 around
calls to private-ZA functions.
(aarch64_mode_needed_local_sme_state): Require ZA to be active
for instructions that access ZT0.
(aarch64_mode_entry): Mark ZA as dead on entry if the function
only shares state other than "za" itself.
(aarch64_mode_exit): Likewise mark ZA as dead on return.
(aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __ARM_STATE_ZT0.
* config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv.
(aarch64_asm_update_zt0): New insn.
(UNSPEC_RESTORE_ZT0): New unspec.
(aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns.
(aarch64_sme_str_zt0): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sme/zt0_state_1.c: New test.
* gcc.target/aarch64/sme/zt0_state_2.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_3.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_4.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_5.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_6.c: Likewise.

5 months agoaarch64: Add svboolx2_t
Richard Sandiford [Tue, 5 Dec 2023 10:24:01 +0000 (10:24 +0000)] 
aarch64: Add svboolx2_t

SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.

The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work.  At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.

We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework.  All that changes on the PCS side is
that we now have an associated mode.

gcc/
* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(register_tuple_type): Handle tuples of predicates.
(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
of predicates.
(pure_scalable_type_info::add_piece): Don't try to form pairs of
predicates.
(VEC_STRUCT): Generalize comment.
(aarch64_classify_vector_mode): Handle VNx32BI.
(aarch64_array_mode): Likewise.  Return BLKmode for arrays of
predicates that have no associated mode, rather than allowing
an integer mode to be chosen.
(aarch64_hard_regno_nregs): Handle VNx32BI.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_split_double_move): New function, split out from...
(aarch64_split_128bit_move): ...here.
(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
(aarch64_pfalse_reg): Likewise.
(aarch64_sve_same_pred_for_ptest_p): Likewise.
(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
(aarch64_expand_mov_immediate): Restrict handling of boolean vector
constants to single-predicate modes.
(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
can be addressed.
(aarch64_class_max_nregs): Handle VNx32BI.
(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
VNx32BI.
(aarch64_mov_operand_p): Restrict predicate constant canonicalization
to single-predicate modes.
(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
* config/aarch64/constraints.md (PR_REGS): New predicate.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust
stack offsets.
(ret_nonpst3): Remove XFAIL.
* gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.

5 months agoaarch64: Add svcount_t
Richard Sandiford [Tue, 5 Dec 2023 10:24:00 +0000 (10:24 +0000)] 
aarch64: Add svcount_t

Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks.  The SME2 ACLE defines an svcount_t type for
this interpretation.

I don't think we have a better way of representing counters than
the VNx16BI that we use for masks.  The patch therefore doesn't
add a new mode for this representation.  It's just something that
is interpreted in context, a bit like signed vs. unsigned integers.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Handle reinterprets between svbool_t
and svcount_t.
(svreinterpret_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add
b<->c forms.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New
type suffix list.
(wrap_type_in_struct, register_type_decl): New functions, split out
from...
(register_tuple_type): ...here.
(register_builtin_types): Handle svcount_t.
(handle_arm_sve_h): Don't create tuples of svcount_t.
* config/aarch64/aarch64-sve-builtins.def (svcount_t): New type.
(c): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test
for svcount_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P)
(TEST_DUAL_P_REV): New macros.
* gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test.
* gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing
an svcount_t.
* gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test
reinterprets involving svcount_t.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t.
* gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_12.c: New test.

5 months agoaarch64: Add +sme2
Richard Sandiford [Tue, 5 Dec 2023 10:24:00 +0000 (10:24 +0000)] 
aarch64: Add +sme2

gcc/
* doc/invoke.texi: Document +sme2.
* doc/sourcebuild.texi: Document aarch64_sme2.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
Add sme2.
* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme2): New
target test.
(check_effective_target_aarch64_asm_sme2_ok): Likewise.

5 months agoaarch64: Update sibcall handling for SME
Richard Sandiford [Tue, 5 Dec 2023 10:11:30 +0000 (10:11 +0000)] 
aarch64: Update sibcall handling for SME

We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").

Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function.  Any function can tail-call a streaming-compatible function.

gcc/
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
Enforce PSTATE.SM and PSTATE.ZA restrictions.
(aarch64_expand_epilogue): Save and restore the arguments
to a sibcall around any change to PSTATE.SM.

gcc/testsuite/
* gcc.target/aarch64/sme/sibcall_1.c: New test.
* gcc.target/aarch64/sme/sibcall_2.c: Likewise.
* gcc.target/aarch64/sme/sibcall_3.c: Likewise.
* gcc.target/aarch64/sme/sibcall_4.c: Likewise.
* gcc.target/aarch64/sme/sibcall_5.c: Likewise.
* gcc.target/aarch64/sme/sibcall_6.c: Likewise.
* gcc.target/aarch64/sme/sibcall_7.c: Likewise.
* gcc.target/aarch64/sme/sibcall_8.c: Likewise.

5 months agoaarch64: Enforce inlining restrictions for SME
Richard Sandiford [Tue, 5 Dec 2023 10:11:30 +0000 (10:11 +0000)] 
aarch64: Enforce inlining restrictions for SME

A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.

A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.

A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
The callee's function type doesn't matter here: one locally-streaming
function can be inlined into another.

gcc/
* config/aarch64/aarch64.cc: Include symbol-summary.h, ipa-prop.h,
and ipa-fnsummary.h
(aarch64_function_attribute_inlinable_p): New function.
(AARCH64_IPA_SM_FIXED, AARCH64_IPA_CLOBBERS_ZA): New constants.
(aarch64_need_ipa_fn_target_info): New function.
(aarch64_update_ipa_fn_target_info): Likewise.
(aarch64_can_inline_p): Restrict the previous ISA flag checks
to non-modal features.  Prevent callees that require a particular
PSTATE.SM state from being inlined into callers that can't guarantee
that state.  Also prevent callees that have ZA state from being
inlined into callers that don't.  Finally, prevent callees that
clobber ZA from being inlined into callers that have ZA state.
(TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sme/inlining_1.c: New test.
* gcc.target/aarch64/sme/inlining_2.c: Likewise.
* gcc.target/aarch64/sme/inlining_3.c: Likewise.
* gcc.target/aarch64/sme/inlining_4.c: Likewise.
* gcc.target/aarch64/sme/inlining_5.c: Likewise.
* gcc.target/aarch64/sme/inlining_6.c: Likewise.
* gcc.target/aarch64/sme/inlining_7.c: Likewise.
* gcc.target/aarch64/sme/inlining_8.c: Likewise.

5 months agoaarch64: Handle PSTATE.SM across abnormal edges
Richard Sandiford [Tue, 5 Dec 2023 10:11:29 +0000 (10:11 +0000)] 
aarch64: Handle PSTATE.SM across abnormal edges

PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver.  Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was originally using.

The requirement on nonlocal goto receivers means that nonlocal
jumps need to ensure that PSTATE.SM is zero.

gcc/
* config/aarch64/aarch64.cc: Include except.h
(aarch64_sme_mode_switch_regs::add_call_preserved_reg): New function.
(aarch64_sme_mode_switch_regs::add_call_preserved_regs): Likewise.
(aarch64_need_old_pstate_sm): Return true if the function has
a nonlocal-goto or exception receiver.
(aarch64_switch_pstate_sm_for_landing_pad): New function.
(aarch64_switch_pstate_sm_for_jump): Likewise.
(pass_switch_pstate_sm::gate): Enable the pass for all
streaming and streaming-compatible functions.
(pass_switch_pstate_sm::execute): Handle non-local gotos and their
receivers.  Handle exception handler entry points.

gcc/testsuite/
* g++.target/aarch64/sme/exceptions_2.C: New test.
* gcc.target/aarch64/sme/nonlocal_goto_1.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_4.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_5.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_6.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_7.c: Likewise.

5 months agoaarch64: Add support for __arm_locally_streaming
Richard Sandiford [Tue, 5 Dec 2023 10:11:29 +0000 (10:11 +0000)] 
aarch64: Add support for __arm_locally_streaming

This patch adds support for the __arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI.  The attribute is valid but redundant for
__arm_streaming functions.

gcc/
* config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add
arm::locally_streaming.
(aarch64_fndecl_is_locally_streaming): New function.
(aarch64_fndecl_sm_state): Handle locally-streaming functions.
(aarch64_cfun_enables_pstate_sm): New function.
(aarch64_add_offset): Add an argument that specifies whether
the streaming vector length should be used instead of the
prevailing one.
(aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_mov_immediate): Update calls accordingly.
(aarch64_need_old_pstate_sm): Return true for locally-streaming
streaming-compatible functions.
(aarch64_layout_frame): Force all call-preserved Z and P registers
to be saved and restored if the function switches PSTATE.SM in the
prologue.
(aarch64_get_separate_components): Disable shrink-wrapping of
such Z and P saves and restores.
(aarch64_use_late_prologue_epilogue): New function.
(aarch64_expand_prologue): Measure SVE lengths in the streaming
vector length for locally-streaming functions, then emit code
to enable streaming mode.
(aarch64_expand_epilogue): Likewise in reverse.
(TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_locally_streaming.

gcc/testsuite/
* gcc.target/aarch64/sme/locally_streaming_1.c: New test.
* gcc.target/aarch64/sme/locally_streaming_2.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_3.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_4.c: Likewise.
* gcc.target/aarch64/sme/keyword_macros_1.c: Add
__arm_locally_streaming.
* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.

5 months agoaarch64: Add support for <arm_sme.h>
Richard Sandiford [Tue, 5 Dec 2023 10:11:28 +0000 (10:11 +0000)] 
aarch64: Add support for <arm_sme.h>

This adds support for the SME parts of arm_sme.h.

gcc/
* doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64.
* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
to install and aarch64-sve-builtins-sme.o to the list of objects
to build.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine TARGET_SME, TARGET_SME_I16I64 and TARGET_SME_F64F64.
(aarch64_pragma_aarch64): Handle arm_sme.h.
* config/aarch64/aarch64-option-extensions.def (sme-i16i64)
(sme-f64f64): New extensions.
* config/aarch64/aarch64-protos.h (aarch64_sme_vq_immediate)
(aarch64_addsvl_addspl_immediate_p, aarch64_output_addsvl_addspl)
(aarch64_output_sme_zero_za): Declare.
(aarch64_output_move_struct): Delete.
(aarch64_sme_ldr_vnum_offset): Declare.
(aarch64_sve::handle_arm_sme_h): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_SM_ON): New macro.
(AARCH64_ISA_SME_I16I64, AARCH64_ISA_SME_F64F64): Likewise.
(TARGET_STREAMING, TARGET_STREAMING_SME): Likewise.
(TARGET_SME_I16I64, TARGET_SME_F64F64): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_rdvl_factor_p): Rename to...
(aarch64_sve_rdvl_addvl_factor_p): ...this.
(aarch64_sve_rdvl_immediate_p): Update accordingly.
(aarch64_rdsvl_immediate_p, aarch64_add_offset): Likewise.
(aarch64_sme_vq_immediate): Likewise.  Make public.
(aarch64_sve_addpl_factor_p): New function.
(aarch64_sve_addvl_addpl_immediate_p): Use
aarch64_sve_rdvl_addvl_factor_p and aarch64_sve_addpl_factor_p.
(aarch64_addsvl_addspl_immediate_p): New function.
(aarch64_output_addsvl_addspl): Likewise.
(aarch64_cannot_force_const_mem): Return true for RDSVL immediates.
(aarch64_classify_index): Handle .Q scaling for VNx1TImode.
(aarch64_classify_address): Likewise for vnum offsets.
(aarch64_output_sme_zero_za): New function.
(aarch64_sme_ldr_vnum_offset_p): Likewise.
* config/aarch64/predicates.md (aarch64_addsvl_addspl_immediate):
New predicate.
(aarch64_pluslong_operand): Include it for SME.
* config/aarch64/constraints.md (Ucj, Uav): New constraints.
* config/aarch64/iterators.md (VNx1TI_ONLY): New mode iterator.
(SME_ZA_I, SME_ZA_SDI, SME_ZA_SDF_I, SME_MOP_BHI): Likewise.
(SME_MOP_HSDF): Likewise.
(UNSPEC_SME_ADDHA, UNSPEC_SME_ADDVA, UNSPEC_SME_FMOPA)
(UNSPEC_SME_FMOPS, UNSPEC_SME_LD1_HOR, UNSPEC_SME_LD1_VER)
(UNSPEC_SME_READ_HOR, UNSPEC_SME_READ_VER, UNSPEC_SME_SMOPA)
(UNSPEC_SME_SMOPS, UNSPEC_SME_ST1_HOR, UNSPEC_SME_ST1_VER)
(UNSPEC_SME_SUMOPA, UNSPEC_SME_SUMOPS, UNSPEC_SME_UMOPA)
(UNSPEC_SME_UMOPS, UNSPEC_SME_USMOPA, UNSPEC_SME_USMOPS)
(UNSPEC_SME_WRITE_HOR, UNSPEC_SME_WRITE_VER): New unspecs.
(elem_bits): Handle x2 and x4 structure modes, plus VNx1TI.
(Vetype, Vesize, VPRED): Handle VNx1TI.
(b): New mode attribute.
(SME_LD1, SME_READ, SME_ST1, SME_WRITE, SME_BINARY_SDI, SME_INT_MOP)
(SME_FP_MOP): New int iterators.
(optab): Handle SME unspecs.
(hv): New int attribute.
* config/aarch64/aarch64.md (*add<mode>3_aarch64): Handle ADDSVL
and ADDSPL.
* config/aarch64/aarch64-sme.md (UNSPEC_SME_LDR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_ldr0, @aarch64_sme_ldrn<mode>): New patterns.
(UNSPEC_SME_STR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_str0, @aarch64_sme_strn<mode>): New patterns.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(UNSPEC_SME_ZERO): New unspec.
(aarch64_sme_zero): New pattern.
(@aarch64_sme_<SME_BINARY_SDI:optab><mode>): Likewise.
(@aarch64_sme_<SME_INT_MOP:optab><mode>): Likewise.
(@aarch64_sme_<SME_FP_MOP:optab><mode>): Likewise.
* config/aarch64/aarch64-sve-builtins.def: Add ZA type suffixes.
Include aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION): New macro.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZA): New call
property.
(CP_WRITE_ZA): Likewise.
(PRED_za_m): New predication type.
(type_suffix_index): Handle DEF_SME_ZA_SUFFIX.
(type_suffix_info): Add vector_p and za_p fields.
(function_instance::num_za_tiles): New member function.
(function_builder::get_attributes): Add an aarch64_feature_flags
argument.
(function_expander::get_contiguous_base): Take a base argument
number, a vnum argument number, and an argument that indicates
whether the vnum parameter is a factor of the SME vector length
or the prevailing vector length.
(function_expander::add_integer_operand): Take a poly_int64.
(sve_switcher::sve_switcher): Take a base set of flags.
(sme_switcher): New class.
(scalar_types): Add a null entry for NUM_VECTOR_TYPES.
* config/aarch64/aarch64-sve-builtins.cc: Include
aarch64-sve-builtins-sme.h.
(pred_suffixes): Add an entry for PRED_za_m.
(type_suffixes): Initialize vector_p and za_p.  Handle ZA suffixes.
(TYPES_all_za, TYPES_d_za, TYPES_za_bhsd_data, TYPES_za_all_data)
(TYPES_za_s_integer, TYPES_za_d_integer, TYPES_mop_base)
(TYPES_mop_base_signed, TYPES_mop_base_unsigned, TYPES_mop_i16i64)
(TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned, TYPES_za): New
type suffix macros.
(preds_m, preds_za_m): New predication lists.
(function_groups): Handle DEF_SME_ZA_FUNCTION.
(scalar_types): Add an entry for NUM_VECTOR_TYPES.
(find_type_suffix_for_scalar_type): Check positively for vectors
rather than negatively for predicates.
(check_required_extensions): Handle PSTATE.SM and PSTATE.ZA
requirements.
(report_out_of_range): Handle the case where the minimum and
maximum are the same.
(function_instance::reads_global_state_p): Return true for functions
that read ZA.
(function_instance::modifies_global_state_p): Return true for functions
that write to ZA.
(sve_switcher::sve_switcher): Add a base flags argument.
(function_builder::get_name): Handle "__arm_" prefixes.
(add_attribute): Add an overload that takes a namespaces.
(add_shared_state_attribute): New function.
(function_builder::get_attributes): Take the required feature flags
as argument.  Add streaming and ZA attributes where appropriate.
(function_builder::add_unique_function): Update calls accordingly.
(function_resolver::check_gp_argument): Assert that the predication
isn't ZA _m predication.
(function_checker::function_checker): Don't bias the argument
number for ZA _m predication.
(function_expander::get_contiguous_base): Add arguments that
specify the base argument number, the vnum argument number,
and an argument that indicates whether the vnum parameter is
a factor of the SME vector length or the prevailing vector length.
Handle the SME case.
(function_expander::add_input_operand): Handle pmode_register_operand.
(function_expander::add_integer_operand): Take a poly_int64.
(init_builtins): Call handle_arm_sme_h for LTO.
(handle_arm_sve_h): Skip SME intrinsics.
(handle_arm_sme_h): New function.
* config/aarch64/aarch64-sve-builtins-functions.h
(read_write_za, write_za): New classes.
(unspec_based_sme_function, za_arith_function): New using aliases.
(quiet_za_arith_function): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_za_int_m, binary_za_m, binary_za_uint_m, bool_inherent)
(inherent_za, inherent_mask_za, ldr_za, load_za, read_za_m, store_za)
(str_za, unary_za_m, write_za_m): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
Expect za_m functions to have an existing governing predicate.
(binary_za_m_base, binary_za_int_m_def, binary_za_m_def): New classes.
(binary_za_uint_m_def, bool_inherent_def, inherent_za_def): Likewise.
(inherent_mask_za_def, ldr_za_def, load_za_def, read_za_m_def)
(store_za_def, str_za_def, unary_za_m_def, write_za_m_def): Likewise.
* config/aarch64/arm_sme.h: New file.
* config/aarch64/aarch64-sve-builtins-sme.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): Depend on
aarch64-sve-builtins-sme.def and aarch64-sve-builtins-sme.h.
(aarch64-sve-builtins-sme.o): New rule.

gcc/testsuite/
* lib/target-supports.exp: Add sme and sme-i16i64 features.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test __ARM_FEATURE_SME*
macros.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Allow functions
to be marked as __arm_streaming, __arm_streaming_compatible, and
__arm_inout("za").
* g++.target/aarch64/sve/acle/general-c++/func_redef_4.c: Mark the
function as __arm_streaming_compatible.
* g++.target/aarch64/sve/acle/general-c++/func_redef_5.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/func_redef_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_5.c: Likewise.
* g++.target/aarch64/sme/aarch64-sme-acle-asm.exp: New test harness.
* gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c: Likewise.

5 months agoaarch64: Generalise _m rules for SVE intrinsics
Richard Sandiford [Tue, 5 Dec 2023 10:11:28 +0000 (10:11 +0000)] 
aarch64: Generalise _m rules for SVE intrinsics

In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.

That rule began to break down in SVE2, and it continues to do
so in SME.  This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_shape::has_merge_argument_p): New member function.
* config/aarch64/aarch64-sve-builtins.cc:
(function_resolver::check_gp_argument): Use it.
(function_expander::get_fallback_value): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(apply_predication): Likewise.
(unary_convert_narrowt_def::has_merge_argument_p): New function.

5 months agoaarch64: Generalise unspec_based_function_base
Richard Sandiford [Tue, 5 Dec 2023 10:11:27 +0000 (10:11 +0000)] 
aarch64: Generalise unspec_based_function_base

Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.

gcc/
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Allow type suffix 1 to determine
the mode of the operation.
(unspec_based_function): Update accordingly.
(unspec_based_fused_function): Likewise.
(unspec_based_fused_lane_function): Likewise.

5 months agoaarch64: Add a VNx1TI mode
Richard Sandiford [Tue, 5 Dec 2023 10:11:27 +0000 (10:11 +0000)] 
aarch64: Add a VNx1TI mode

Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others.  It's purely an RTL
convenience and isn't (yet) a valid storage mode.

gcc/
* config/aarch64/aarch64-modes.def: Add VNx1TI.

5 months agoaarch64: Add a register class for w12-w15
Richard Sandiford [Tue, 5 Dec 2023 10:11:26 +0000 (10:11 +0000)] 
aarch64: Add a register class for w12-w15

Some SME instructions use w12-w15 to index ZA.  This patch
adds a register class for that range.

gcc/
* config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro.
(W12_W15_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
* config/aarch64/aarch64.cc (aarch64_regno_regclass)
(aarch64_class_max_nregs, aarch64_register_move_cost): Handle
W12_W15_REGS.

5 months agoaarch64: Add support for SME ZA attributes
Richard Sandiford [Tue, 5 Dec 2023 10:11:26 +0000 (10:11 +0000)] 
aarch64: Add support for SME ZA attributes

SME has an array called ZA that can be enabled and disabled separately
from streaming mode.  A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.

In C and C++, the state of PSTATE.ZA is controlled using function
attributes.  There are four attributes that can be attached to
function types to indicate that the function shares ZA with its
caller.  These are:

- arm::in("za")
- arm::out("za")
- arm::inout("za")
- arm::preserves("za")

If a function's type has one of these shared-ZA attributes,
PSTATE.ZA is specified to be 1 on entry to the function and on return
from the function.  Otherwise, the caller and callee have separate
ZA contexts; they do not use ZA to share data.

Although normal non-shared-ZA functions have a separate ZA context
from their callers, nested uses of ZA are expected to be rare.
The ABI therefore defines a cooperative lazy saving scheme that
allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)

Functions that want to use ZA internally have an arm::new("za")
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body.  It also tells the compiler
to commit any lazy save initiated by a caller.

The patch uses various abstract hard registers to track dataflow
relating to ZA.  See the comments in the patch for details.

The lazy save scheme is intended to be transparent to most normal
functions, so that they don't need to be recompiled for SME.
This is reflected in the way that most normal functions ignore
the new hard registers added in the patch.

As with arm::streaming and arm::streaming_compatible, the attributes are
also available as __arm_<attr>.  This has two advantages: it triggers an
error on compilers that don't understand the attributes, and it eases
use on C, where [[...]] attributes were only added in C23.

gcc/
* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
(aarch64_output_rdsvl, aarch64_optimize_mode_switching)
(aarch64_restore_za): Declare.
* config/aarch64/constraints.md (UsR): New constraint.
* config/aarch64/aarch64.md (LOWERING_REGNUM, TPIDR_BLOCK_REGNUM)
(SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, ZA_FREE_REGNUM)
(ZA_SAVED_REGNUM, ZA_REGNUM, FIRST_FAKE_REGNUM): New constants.
(LAST_FAKE_REGNUM): Likewise.
(UNSPEC_SAVE_NZCV, UNSPEC_RESTORE_NZCV, UNSPEC_SME_VQ): New unspecs.
(arches): Add sme.
(arch_enabled): Handle it.
(*cb<optab><mode>1): Rename to...
(aarch64_cb<optab><mode>1): ...this.
(*movsi_aarch64): Add an alternative for RDSVL.
(*movdi_aarch64): Likewise.
(aarch64_save_nzcv, aarch64_restore_nzcv): New insns.
* config/aarch64/aarch64-sme.md (UNSPEC_SMSTOP_ZA)
(UNSPEC_INITIAL_ZERO_ZA, UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE)
(UNSPEC_READ_TPIDR2, UNSPEC_WRITE_TPIDR2, UNSPEC_SETUP_LOCAL_TPIDR2)
(UNSPEC_RESTORE_ZA, UNSPEC_START_PRIVATE_ZA_CALL): New unspecs.
(UNSPEC_END_PRIVATE_ZA_CALL, UNSPEC_COMMIT_LAZY_SAVE): Likewise.
(UNSPECV_ASM_UPDATE_ZA): New unspecv.
(aarch64_tpidr2_save, aarch64_smstart_za, aarch64_smstop_za)
(aarch64_initial_zero_za, aarch64_setup_local_tpidr2)
(aarch64_clear_tpidr2, aarch64_write_tpidr2, aarch64_read_tpidr2)
(aarch64_tpidr2_restore, aarch64_restore_za, aarch64_asm_update_za)
(aarch64_start_private_za_call, aarch64_end_private_za_call)
(aarch64_commit_lazy_save): New patterns.
* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
(FIXED_REGISTERS, REGISTER_NAMES): Add the new fake ZA registers.
(CALL_USED_REGISTERS): Replace with...
(CALL_REALLY_USED_REGISTERS): ...this and add the fake ZA registers.
(FIRST_PSEUDO_REGISTER): Bump to include the fake ZA registers.
(FAKE_REGS): New register class.
(REG_CLASS_NAMES): Update accordingly.
(REG_CLASS_CONTENTS): Likewise.
(machine_function::tpidr2_block): New member variable.
(machine_function::tpidr2_block_ptr): Likewise.
(machine_function::za_save_buffer): Likewise.
(machine_function::next_asm_update_za_id): Likewise.
(CUMULATIVE_ARGS::shared_za_flags): Likewise.
(aarch64_mode_entity, aarch64_local_sme_state): New enums.
(aarch64_tristate_mode): Likewise.
(OPTIMIZE_MODE_SWITCHING, NUM_MODES_FOR_MODE_SWITCHING): Define.
* config/aarch64/aarch64.cc (AARCH64_STATE_SHARED, AARCH64_STATE_IN)
(AARCH64_STATE_OUT): New constants.
(aarch64_attribute_shared_state_flags): New function.
(aarch64_lookup_shared_state_flags, aarch64_fndecl_has_new_state)
(aarch64_check_state_string, cmp_string_csts): Likewise.
(aarch64_merge_string_arguments, aarch64_check_arm_new_against_type)
(handle_arm_new, handle_arm_shared): Likewise.
(handle_arm_new_za_attribute): New
(aarch64_arm_attribute_table): Add new, preserves, in, out, and inout.
(aarch64_hard_regno_nregs): Handle FAKE_REGS.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_fntype_shared_flags, aarch64_fntype_pstate_za): New functions.
(aarch64_fntype_isa_mode): Include aarch64_fntype_pstate_za.
(aarch64_fndecl_has_state, aarch64_fndecl_pstate_za): New functions.
(aarch64_fndecl_isa_mode): Include aarch64_fndecl_pstate_za.
(aarch64_cfun_incoming_pstate_za, aarch64_cfun_shared_flags)
(aarch64_cfun_has_new_state, aarch64_cfun_has_state): New functions.
(aarch64_sme_vq_immediate, aarch64_sme_vq_unspec_p): Likewise.
(aarch64_rdsvl_immediate_p, aarch64_output_rdsvl): Likewise.
(aarch64_expand_mov_immediate): Handle RDSVL immediates.
(aarch64_function_arg): Add the ZA sharing flags as a third limb
of the PARALLEL.
(aarch64_init_cumulative_args): Record the ZA sharing flags.
(aarch64_extra_live_on_entry): New function.  Handle the new
ZA-related fake registers.
(aarch64_epilogue_uses): Handle the new ZA-related fake registers.
(aarch64_cannot_force_const_mem): Handle UNSPEC_SME_VQ constants.
(aarch64_get_tpidr2_block, aarch64_get_tpidr2_ptr): New functions.
(aarch64_init_tpidr2_block, aarch64_restore_za): Likewise.
(aarch64_layout_frame): Check whether the current function creates
new ZA state.  Record that it clobbers LR if so.
(aarch64_expand_prologue): Handle functions that create new ZA state.
(aarch64_expand_epilogue): Likewise.
(aarch64_create_tpidr2_block): New function.
(aarch64_restore_za): Likewise.
(aarch64_start_call_args): Disallow calls to shared-ZA functions
from functions that have no ZA state.  Emit a marker instruction
before calls to private-ZA functions from functions that have
SME state.
(aarch64_expand_call): Add return registers for state that is
managed via attributes.  Record the use and clobber information
for the ZA registers.
(aarch64_end_call_args): New function.
(aarch64_regno_regclass): Handle FAKE_REGS.
(aarch64_class_max_nregs): Likewise.
(aarch64_override_options_internal): Require TARGET_SME for
functions that have ZA state.
(aarch64_conditional_register_usage): Handle FAKE_REGS.
(aarch64_mov_operand_p): Handle RDSVL immediates.
(aarch64_comp_type_attributes): Check that the ZA sharing flags
are equal.
(aarch64_merge_decl_attributes): New function.
(aarch64_optimize_mode_switching, aarch64_mode_emit_za_save_buffer)
(aarch64_mode_emit_local_sme_state, aarch64_mode_emit):  Likewise.
(aarch64_insn_references_sme_state_p): Likewise.
(aarch64_mode_needed_local_sme_state): Likewise.
(aarch64_mode_needed_za_save_buffer, aarch64_mode_needed): Likewise.
(aarch64_mode_after_local_sme_state, aarch64_mode_after): Likewise.
(aarch64_local_sme_confluence, aarch64_mode_confluence): Likewise.
(aarch64_one_shot_backprop, aarch64_local_sme_backprop): Likewise.
(aarch64_mode_backprop, aarch64_mode_entry): Likewise.
(aarch64_mode_exit, aarch64_mode_eh_handler): Likewise.
(aarch64_mode_priority, aarch64_md_asm_adjust): Likewise.
(TARGET_END_CALL_ARGS, TARGET_MERGE_DECL_ATTRIBUTES): Define.
(TARGET_MODE_EMIT, TARGET_MODE_NEEDED, TARGET_MODE_AFTER): Likewise.
(TARGET_MODE_CONFLUENCE, TARGET_MODE_BACKPROP): Likewise.
(TARGET_MODE_ENTRY, TARGET_MODE_EXIT): Likewise.
(TARGET_MODE_EH_HANDLER, TARGET_MODE_PRIORITY): Likewise.
(TARGET_EXTRA_LIVE_ON_ENTRY): Likewise.
(TARGET_MD_ASM_ADJUST): Use aarch64_md_asm_adjust.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_new, __arm_preserves,__arm_in, __arm_out, and __arm_inout.

gcc/testsuite/
* gcc.target/aarch64/sme/za_state_1.c: New test.
* gcc.target/aarch64/sme/za_state_2.c: Likewise.
* gcc.target/aarch64/sme/za_state_3.c: Likewise.
* gcc.target/aarch64/sme/za_state_4.c: Likewise.
* gcc.target/aarch64/sme/za_state_5.c: Likewise.
* gcc.target/aarch64/sme/za_state_6.c: Likewise.
* g++.target/aarch64/sme/exceptions_1.C: Likewise.
* gcc.target/aarch64/sme/keyword_macros_1.c: Add ZA macros.
* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.

5 months agoaarch64: Switch PSTATE.SM around calls
Richard Sandiford [Tue, 5 Dec 2023 10:11:25 +0000 (10:11 +0000)] 
aarch64: Switch PSTATE.SM around calls

This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
* config/aarch64/aarch64-passes.def
(pass_late_thread_prologue_and_epilogue): New pass.
* config/aarch64/aarch64-sme.md: New file.
* config/aarch64/aarch64.md: Include it.
(*tb<optab><mode>1): Rename to...
(@aarch64_tb<optab><mode>): ...this.
(call, call_value, sibcall, sibcall_value): Don't require operand 2
to be a CONST_INT.
* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
the insn.
(make_pass_switch_sm_state): Declare.
* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
(CALL_USED_REGISTER): Mark VG as call-preserved.
(aarch64_frame::old_svcr_offset): New member variable.
(machine_function::call_switches_sm_state): Likewise.
(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
(aarch64_cfun_incoming_pstate_sm): New function.
(aarch64_call_switches_pstate_sm): Likewise.
(aarch64_reg_save_mode): Return DImode for VG_REGNUM.
(aarch64_callee_isa_mode): New function.
(aarch64_insn_callee_isa_mode): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_switch_pstate_sm): Likewise.
(aarch64_sme_mode_switch_regs): New class.
(aarch64_record_sme_mode_switch_args): New function.
(aarch64_finish_sme_mode_switch_args): Likewise.
(aarch64_function_arg): Handle the end marker by returning a
PARALLEL that contains the ABI cookie that we used previously
alongside the result of aarch64_finish_sme_mode_switch_args.
(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
(aarch64_function_arg_advance): If a call would switch SM state,
record all argument registers that would need to be saved around
the mode switch.
(aarch64_need_old_pstate_sm): New function.
(aarch64_layout_frame): Decide whether the frame needs to store the
incoming value of PSTATE.SM and allocate a save slot for it if so.
If a function switches SME state, arrange to save the old value
of the DWARF VG register.  Handle the case where this is the only
register save slot above the FP.
(aarch64_save_callee_saves): Handles saves of the DWARF VG register.
(aarch64_get_separate_components): Prevent such saves from being
shrink-wrapped.
(aarch64_old_svcr_mem): New function.
(aarch64_read_old_svcr): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_expand_prologue): Handle saves of the DWARF VG register.
Initialize any SVCR save slot.
(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
both the UNSPEC_CALLEE_ABI value and a list of registers that need
to be preserved across a change to PSTATE.SM.  If the call does
involve such a change to PSTATE.SM, record the registers that
would be clobbered by this process.  Also emit an instruction
to mark the temporary change in VG.  Update call_switches_pstate_sm.
(aarch64_emit_call_insn): Return the emitted instruction.
(aarch64_frame_pointer_required): New function.
(aarch64_conditional_register_usage): Prevent VG_REGNUM from being
treated as a register operand.
(aarch64_switch_pstate_sm_for_call): New function.
(pass_data_switch_pstate_sm): New pass variable.
(pass_switch_pstate_sm): New pass class.
(make_pass_switch_pstate_sm): New function.
(TARGET_FRAME_POINTER_REQUIRED): Define.
* config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.

gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_1.c: New test.
* gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_9.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_10.c: Likewise.

5 months agoaarch64: Mark relevant SVE instructions as non-streaming
Richard Sandiford [Tue, 5 Dec 2023 10:11:24 +0000 (10:11 +0000)] 
aarch64: Mark relevant SVE instructions as non-streaming

Following on from the previous Advanced SIMD patch, this one
divides SVE instructions into non-streaming and streaming-
compatible groups.

gcc/
* config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro.
(TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it.
(TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def: Separate out
the functions that require PSTATE.SM to be 0 and guard them
with AARCH64_FL_SM_OFF.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
* config/aarch64/aarch64-sve-builtins.cc (check_required_extensions):
Enforce AARCH64_FL_SM_OFF requirements.
* config/aarch64/aarch64-sve.md (aarch64_wrffr): Require
TARGET_NON_STREAMING
(aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise.
(*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc)
(@aarch64_ld<fn>f1<mode>): Likewise.
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>)
(gather_load<mode><v_int_container>): Likewise
(mask_gather_load<mode><v_int_container>): Likewise.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>)
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked)
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>, @aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>)
(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>)
(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw)
(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw)
(scatter_store<mode><v_int_container>): Likewise.
(mask_scatter_store<mode><v_int_container>): Likewise.
(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked)
(*mask_scatter_store<mode><v_int_container>_sxtw): Likewise.
(*mask_scatter_store<mode><v_int_container>_uxtw): Likewise.
(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw)
(@aarch64_sve_ld1ro<mode>, @aarch64_adr<mode>): Likewise.
(*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise.
(*aarch64_adr_uxtw_and, @aarch64_adr<mode>_shift): Likewise.
(*aarch64_adr<mode>_shift, *aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw, @aarch64_sve_add_<optab><vsi2qi>): Likewise.
(@aarch64_sve_<sve_fp_op><mode>, fold_left_plus_<mode>): Likewise.
(mask_fold_left_plus_<mode>, @aarch64_sve_compact<mode>): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>)
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
(@aarch64_sve2_histcnt<mode>, @aarch64_sve2_histseg<mode>): Likewise.
(@aarch64_pred_<SVE2_MATCH:sve_int_op><mode>): Likewise.
(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_cc): Likewise.
(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_ptest): Likewise.
* config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA
depend on TARGET_NON_STREAMING.
(SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA.

gcc/testsuite/
* g++.target/aarch64/sve/aarch64-ssve.exp: New harness.
* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add
-DSTREAMING_COMPATIBLE to the list of options.
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
Fix pasto in variable name.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions
as streaming-compatible if STREAMING_COMPATIBLE is defined.
* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for
streaming-compatible code.
* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise.

5 months agoaarch64: Distinguish streaming-compatible AdvSIMD insns
Richard Sandiford [Tue, 5 Dec 2023 10:11:24 +0000 (10:11 +0000)] 
aarch64: Distinguish streaming-compatible AdvSIMD insns

The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_<mode>high, but I'd rather provide it
than not.

gcc/
* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
(TARGET_SIMD): Require PSTATE.SM to be 0.
(AARCH64_ISA_SM_OFF): New macro.
* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
(aarch64_print_operand): Support '%Z'.
(aarch64_secondary_reload): Expect SVE moves to be used for
Advanced SIMD modes if SVE is enabled and non-streaming
Advanced SIMD isn't.
(aarch64_register_move_cost): Likewise.
(aarch64_simd_container_mode): Extend Advanced SIMD mode
handling to TARGET_BASE_SIMD.
(aarch64_expand_cpymem): Expand commentary.
* config/aarch64/aarch64.md (arches): Add base_simd and nobase_simd.
(arch_enabled): Handle it.
(*mov<mode>_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
(*movti_aarch64): Use an SVE move instruction if non-streaming
SIMD isn't available.
(*mov<TFD:mode>_aarch64): Likewise.
(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
(store_pair_dw_tftf): Likewise.
(loadwb_pair<TX:mode>_<P:mode>): Likewise.
(storewb_pair<TX:mode>_<P:mode>): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Allow UMOV in streaming mode.
(*aarch64_simd_mov<VQMOV:mode>): Use an SVE move instruction
if non-streaming SIMD isn't available.
(aarch64_store_lane0<mode>): Depend on TARGET_FLOAT rather than
TARGET_SIMD.
(aarch64_simd_mov_from_<mode>low): Likewise.  Use fmov if
Advanced SIMD is completely disabled.
(aarch64_simd_mov_from_<mode>high): Use SVE EXT instructions if
non-streaming SIMD isn't available.

gcc/testsuite/
* gcc.target/aarch64/movdf_2.c: New test.
* gcc.target/aarch64/movdi_3.c: Likewise.
* gcc.target/aarch64/movhf_2.c: Likewise.
* gcc.target/aarch64/movhi_2.c: Likewise.
* gcc.target/aarch64/movqi_2.c: Likewise.
* gcc.target/aarch64/movsf_2.c: Likewise.
* gcc.target/aarch64/movsi_2.c: Likewise.
* gcc.target/aarch64/movtf_3.c: Likewise.
* gcc.target/aarch64/movtf_4.c: Likewise.
* gcc.target/aarch64/movti_3.c: Likewise.
* gcc.target/aarch64/movti_4.c: Likewise.
* gcc.target/aarch64/movv16qi_4.c: Likewise.
* gcc.target/aarch64/movv16qi_5.c: Likewise.
* gcc.target/aarch64/movv8qi_4.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.

5 months agoaarch64: Add +sme
Richard Sandiford [Tue, 5 Dec 2023 10:11:23 +0000 (10:11 +0000)] 
aarch64: Add +sme

This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code.  (arm_streaming_compatible code
does not necessarily assume the presence of SME.  It just has to
work when SME is present and streaming mode is enabled.)

gcc/
* doc/invoke.texi: Document SME.
* doc/sourcebuild.texi: Document aarch64_sve.
* config/aarch64/aarch64-option-extensions.def (sme): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro.
(TARGET_SME): Likewise.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Ensure that SME is present when compiling streaming code.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme): New
target test.
* gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled
if it isn't by default.
* g++.target/aarch64/sme/aarch64-sme.exp: Likewise.
* gcc.target/aarch64/sme/streaming_mode_3.c: New test.

5 months agoaarch64: Add arm_streaming(_compatible) attributes
Richard Sandiford [Tue, 5 Dec 2023 10:11:23 +0000 (10:11 +0000)] 
aarch64: Add arm_streaming(_compatible) attributes

This patch adds support for recognising the SME arm::streaming
and arm::streaming_compatible attributes.  These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or whether
we don't know at compile time either way.

As far as the compiler is concerned, this effectively creates three
ISA submodes: streaming mode enables things that are not available
in non-streaming mode, non-streaming mode enables things that not
available in streaming mode, and streaming-compatible mode has to stick
to the common subset.  This means that some instructions are conditional
on PSTATE.SM==1 and some are conditional on PSTATE.SM==0.

I wondered about recording the streaming state in a new variable.
However, the set of available instructions is also influenced by
PSTATE.ZA (added later), so I think it makes sense to view this
as an instance of a more general mechanism.  Also, keeping the
PSTATE.SM state in the same flag variable as the other ISA
features makes it possible to sum up the requirements of an
ACLE function in a single value.

The patch therefore adds a new set of feature flags called "ISA modes".
Unlike the other two sets of flags (optional features and architecture-
level features), these ISA modes are not controlled directly by
command-line parameters or "target" attributes.

arm::streaming and arm::streaming_compatible are function type attributes
rather than function declaration attributes.  This means that we need
to find somewhere to copy the type information across to a function's
target options.  The patch does this in aarch64_set_current_function.

We also need to record which ISA mode a callee expects/requires
to be active on entry.  (The same mode is then active on return.)
The patch extends the current UNSPEC_CALLEE_ABI cookie to include
this information, as well as the PCS variant that it recorded
previously.

The attributes can also be written __arm_streaming and
__arm_streaming_compatible.  This has two advantages: it triggers
an error on compilers that don't understand the attributes, and it
eases use on C, where [[...]] attributes were only added in C23.

gcc/
* config/aarch64/aarch64-isa-modes.def: New file.
* config/aarch64/aarch64.h: Include it in the feature enumerations.
(AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants.
(AARCH64_FL_DEFAULT_ISA_MODE): Likewise.
(AARCH64_ISA_MODE): New macro.
(CUMULATIVE_ARGS): Add an isa_mode field.
* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
* config/aarch64/aarch64.cc (attr_streaming_exclusions)
(aarch64_gnu_attributes, aarch64_gnu_attribute_table)
(aarch64_arm_attributes, aarch64_arm_attribute_table): New tables.
(aarch64_attribute_table): Redefine to include the gnu and arm
attributes.
(aarch64_fntype_pstate_sm, aarch64_fntype_isa_mode): New functions.
(aarch64_fndecl_pstate_sm, aarch64_fndecl_isa_mode): Likewise.
(aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise.
(aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them.
(aarch64_function_arg, aarch64_output_mi_thunk): Likewise.
(aarch64_init_cumulative_args): Initialize the isa_mode field.
(aarch64_output_mi_thunk): Use aarch64_gen_callee_cookie to get
the ABI cookie.
(aarch64_override_options): Add the ISA mode to the feature set.
(aarch64_temporary_target::copy_from_fndecl): Likewise.
(aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise.
(aarch64_set_current_function): Maintain the correct ISA mode.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
(aarch64_comp_type_attributes): Handle arm::streaming and
arm::streaming_compatible.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_streaming and __arm_streaming_compatible.
* config/aarch64/aarch64.md (tlsdesc_small_<mode>): Use
aarch64_gen_callee_cookie to get the ABI cookie.
* config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files.

gcc/testsuite/
* gcc.target/aarch64/sme/aarch64-sme.exp: New harness.
* gcc.target/aarch64/sme/streaming_mode_1.c: New test.
* gcc.target/aarch64/sme/streaming_mode_2.c: Likewise.
* gcc.target/aarch64/sme/keyword_macros_1.c: Likewise.
* g++.target/aarch64/sme/aarch64-sme.exp: New harness.
* g++.target/aarch64/sme/streaming_mode_1.C: New test.
* g++.target/aarch64/sme/streaming_mode_2.C: Likewise.
* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
* gcc.target/aarch64/auto-init-1.c: Only expect the call insn
to contain 1 (const_int 0), not 2.

5 months agoaarch64: Add tuple forms of svreinterpret
Richard Sandiford [Tue, 5 Dec 2023 10:11:22 +0000 (10:11 +0000)] 
aarch64: Add tuple forms of svreinterpret

SME2 adds a number of intrinsics that operate on tuples of 2 and 4
vectors.  The ACLE therefore extends the existing svreinterpret
intrinsics to handle tuples as well.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Punt on tuple forms.
(svreinterpret_impl::expand): Use tuple_mode instead of vector_mode.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret):
Extend to x1234 groups.
* config/aarch64/aarch64-sve-builtins-functions.h
(multi_vector_function::vectors_per_tuple): If the function has
a group suffix, get the number of vectors from there.
* config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def)
(reinterpret): New function shape.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle
DEF_SVE_FUNCTION_GS.
* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New
macro.
(DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default.
* config/aarch64/aarch64-sve-builtins.h
(function_instance::tuple_mode): New member function.
(function_base::vectors_per_tuple): Take the function instance
as argument and get the number from the group suffix.
(function_instance::vectors_per_tuple): Update accordingly.
* config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4)
(SVE_ALL_STRUCT): New mode iterators.
(SVE_STRUCT): Redefine in terms of SVE_FULL*.
* config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>)
(*aarch64_sve_reinterpret<mode>): Extend to SVE structure modes.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN):
New macro.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for
tuple forms.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise.

5 months agoaarch64: Tweak error message for (tuple,vector) pairs
Richard Sandiford [Tue, 5 Dec 2023 10:11:22 +0000 (10:11 +0000)] 
aarch64: Tweak error message for (tuple,vector) pairs

SME2 adds more intrinsics that take a tuple of vectors followed
by a single vector, with the two arguments expected to have the
same element type.  Unlike with the existing svset* intrinsics,
the size of the tuple is not fixed by the overloaded function name.

This patch adds an error message that (hopefully) copes better
with that combination.

gcc/
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::require_derived_vector_type): Add a specific
error message for the case in which the caller wants a single
vector whose element type matches a previous tuyple argument.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/set_1.c: Tweak expected
error message.
* gcc.target/aarch64/sve/acle/general-c/set_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/set_5.c: Likewise.

5 months agoaarch64: Make more use of sve_type in ACLE code
Richard Sandiford [Tue, 5 Dec 2023 10:11:21 +0000 (10:11 +0000)] 
aarch64: Make more use of sve_type in ACLE code

This patch makes some functions operate on sve_type, rather than just
on type suffixes.  It also allows an overload to be resolved based on
a mode and sve_type.  In this case the sve_type is used to derive the
group size as well as a type suffix.

This is needed for the SME2 intrinsics and the new tuple forms of
svreinterpret.  No functional change intended on its own.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_resolver::lookup_form): Add an overload that takes
an sve_type rather than type and group suffixes.
(function_resolver::resolve_to): Likewise.
(function_resolver::infer_vector_or_tuple_type): Return an sve_type.
(function_resolver::infer_tuple_type): Likewise.
(function_resolver::require_matching_vector_type): Take an sve_type
rather than a type_suffix_index.
(function_resolver::require_derived_vector_type): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (num_vectors_to_group):
New function.
(function_resolver::lookup_form): Add an overload that takes
an sve_type rather than type and group suffixes.
(function_resolver::resolve_to): Likewise.
(function_resolver::infer_vector_or_tuple_type): Return an sve_type.
(function_resolver::infer_tuple_type): Likewise.
(function_resolver::infer_vector_type): Update accordingly.
(function_resolver::require_matching_vector_type): Take an sve_type
rather than a type_suffix_index.
(function_resolver::require_derived_vector_type): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc (get_def::resolve)
(set_def::resolve, store_def::resolve, tbl_tuple_def::resolve): Update
calls accordingly.

5 months agoaarch64: Replace vague "previous arguments" message
Richard Sandiford [Tue, 5 Dec 2023 10:11:21 +0000 (10:11 +0000)] 
aarch64: Replace vague "previous arguments" message

If an SVE ACLE intrinsic requires two arguments to have the
same type, the C resolver would report mismatches as "argument N
has type T2, but previous arguments had type T1".  This patch makes
the message say which argument had type T1.

This is needed to give decent error messages for some SME cases.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_resolver::require_matching_vector_type): Add a parameter
that specifies the number of the earlier argument that is being
matched against.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::require_matching_vector_type): Likewise.
(require_derived_vector_type): Update calls accordingly.
(function_resolver::resolve_unary): Likewise.
(function_resolver::resolve_uniform): Likewise.
(function_resolver::resolve_uniform_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(binary_long_lane_def::resolve): Likewise.
(clast_def::resolve, ternary_uint_def::resolve): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/*: Replace "but previous
arguments had" with "but argument N had".

5 months agoaarch64: Generalise some SVE ACLE error messages
Richard Sandiford [Tue, 5 Dec 2023 10:11:20 +0000 (10:11 +0000)] 
aarch64: Generalise some SVE ACLE error messages

The current SVE ACLE function-resolution diagnostics assume
that a function has a fixed choice between vectors or tuples
of vectors.  If an argument was not an SVE type at all, the
error message said the function "expects an SVE vector type"
or "expects an SVE tuple type".

This patch generalises the error to cope with cases where
an argument can be either a vector or a tuple.  It also splits
out the diagnostics for mismatched tuple sizes, so that they
can be reused by later patches.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_resolver::infer_sve_type): New member function.
(function_resolver::report_incorrect_num_vectors): Likewise.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::infer_sve_type): New function,.
(function_resolver::report_incorrect_num_vectors): New function,
split out from...
(function_resolver::infer_vector_or_tuple_type): ...here.  Use
infer_sve_type.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/*: Update expected error
messages.

5 months agoaarch64: Add sve_type to SVE builtins code
Richard Sandiford [Tue, 5 Dec 2023 10:11:20 +0000 (10:11 +0000)] 
aarch64: Add sve_type to SVE builtins code

Until now, the SVE ACLE code had mostly been able to represent
individual SVE arguments with just an element type suffix (s32, u32,
etc.).  However, the SME2 ACLE provides many overloaded intrinsics
that operate on tuples rather than single vectors.  This patch
therefore adds a new type (sve_type) that combines an element
type suffix with a vector count.  This is enough to uniquely
represent all SVE ACLE types.

gcc/
* config/aarch64/aarch64-sve-builtins.h (sve_type): New struct.
(sve_type::operator==): New function.
(function_resolver::get_vector_type): Delete.
(function_resolver::report_no_such_form): Take an sve_type rather
than a type_suffix_index.
* config/aarch64/aarch64-sve-builtins.cc (get_vector_type): New
function.
(function_resolver::get_vector_type): Delete.
(function_resolver::report_no_such_form): Take an sve_type rather
than a type_suffix_index.
(find_sve_type): New function, split out from...
(function_resolver::infer_vector_or_tuple_type): ...here.

5 months agoaarch64: Add group suffixes to SVE intrinsics
Richard Sandiford [Tue, 5 Dec 2023 10:11:19 +0000 (10:11 +0000)] 
aarch64: Add group suffixes to SVE intrinsics

The SME2 ACLE adds a new "group" suffix component to the naming
convention for SVE intrinsics.  This is also used in the new tuple
forms of the svreinterpret intrinsics.

This patch adds support for group suffixes and defines the
x2, x3 and x4 suffixes that are needed for the svreinterprets.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Take
a group suffix index parameter.
(build_32_64, build_all): Update accordingly.  Iterate over all
group suffixes.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svqrshl_impl::fold)
(svqshl_impl::fold, svrshl_impl::fold): Update function_instance
constructors.
* config/aarch64/aarch64-sve-builtins.cc (group_suffixes): New array.
(groups_none): New constant.
(function_groups): Initialize the groups field.
(function_instance::hash): Hash the group index.
(function_builder::get_name): Add the group suffix.
(function_builder::add_overloaded_functions): Iterate over all
group suffixes.
(function_resolver::lookup_form): Take a group suffix parameter.
(function_resolver::resolve_to): Likewise.
* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_GROUP_SUFFIX): New
macro.
(x2, x3, x4): New group suffixes.
* config/aarch64/aarch64-sve-builtins.h (group_suffix_index): New enum.
(group_suffix_info): New structure.
(function_group_info::groups): New member variable.
(function_instance::group_suffix_id): Likewise.
(group_suffixes): New array.
(function_instance::operator==): Compare the group suffixes.
(function_instance::group_suffix): New function.

5 months agoaarch64: Make AARCH64_FL_SVE requirements explicit
Richard Sandiford [Tue, 5 Dec 2023 10:11:19 +0000 (10:11 +0000)] 
aarch64: Make AARCH64_FL_SVE requirements explicit

So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code.  It's therefore necessary to make
the implicit SVE requirement explicit.

gcc/
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove
implied requirement on SVE.
* config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.

5 months agoaarch64: Use SVE's RDVL instruction
Richard Sandiford [Tue, 5 Dec 2023 10:11:18 +0000 (10:11 +0000)] 
aarch64: Use SVE's RDVL instruction

We didn't previously use SVE's RDVL instruction, since the CNT*
forms are preferred and provide most of the range.  However,
there are some cases that RDVL can handle and CNT* can't,
and using RDVL-like instructions becomes important for SME.

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_rdvl_immediate_p)
(aarch64_output_sve_rdvl): Declare.
* config/aarch64/aarch64.cc (aarch64_sve_cnt_factor_p): New
function, split out from...
(aarch64_sve_cnt_immediate_p): ...here.
(aarch64_sve_rdvl_factor_p): New function.
(aarch64_sve_rdvl_immediate_p): Likewise.
(aarch64_output_sve_rdvl): Likewise.
(aarch64_offset_temporaries): Rewrite the SVE handling to use RDVL
for some cases.
(aarch64_expand_mov_immediate): Handle RDVL immediates.
(aarch64_mov_operand_p): Likewise.
* config/aarch64/constraints.md (Usr): New constraint.
* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64): Add an RDVL
alternative.
(*movsi_aarch64, *movdi_aarch64): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/cntb.c: Tweak expected output.
* gcc.target/aarch64/sve/acle/asm/cnth.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntd.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfb.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise.
* gcc.target/aarch64/sve/loop_add_4.c: Expect RDVL to be used
to calculate the -17 and 17 factors.
* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise the 18 factor.

5 months agoaarch64: Generalise require_immediate_lane_index
Richard Sandiford [Tue, 5 Dec 2023 10:11:18 +0000 (10:11 +0000)] 
aarch64: Generalise require_immediate_lane_index

require_immediate_lane_index previously hard-coded the assumption
that the group size is determined by the argument immediately before
the index.  However, for SME, there are cases where it should be
determined by an earlier argument instead.

gcc/
* config/aarch64/aarch64-sve-builtins.h:
(function_checker::require_immediate_lane_index): Add an argument
for the index of the indexed vector argument.
* config/aarch64/aarch64-sve-builtins.cc
(function_checker::require_immediate_lane_index): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_bfloat_lane_base::check): Update accordingly.
(ternary_qq_lane_base::check): Likewise.
(binary_lane_def::check): Likewise.
(binary_long_lane_def::check): Likewise.
(ternary_lane_def::check): Likewise.
(ternary_lane_rotate_def::check): Likewise.
(ternary_long_lane_def::check): Likewise.
(ternary_qq_lane_rotate_def::check): Likewise.

5 months agoada: Fix Ada bootstrap on Solaris
Rainer Orth [Tue, 5 Dec 2023 10:08:05 +0000 (11:08 +0100)] 
ada: Fix Ada bootstrap on Solaris

The recent warning patches broke Ada bootstrap on Solaris:

adaint.c: In function '__gnat_kill':
adaint.c:3597:3: error: implicit declaration of function 'kill'
[-Wimplicit-function-declaration]
 3597 |   kill (pid, sig);
      |   ^~~~

expect.c: In function '__gnat_expect_poll':
expect.c:409:5: error: implicit declaration of function 'memset'
[-Wimplicit-function-declaration]
  409 |     FD_ZERO (&rset);
      |     ^~~~~~~
expect.c:55:1: note: include '<string.h>' or provide a declaration of 'memset'
   54 | #include <sys/wait.h>
  +++ |+#include <string.h>
   55 | #endif

I'm now including the necessary headers: <signal.h> for kill and
<string.h> for memset.

Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, x86_64-pc-linux-gnu, and
x86_64-apple-darwin23.1.0.

2023-12-03  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/ada:
* adaint.c: Include <signal.h>.
* expect.c: Include <string.h>.

5 months agogm2: Fix mc/mc.flex compilation on Solaris
Rainer Orth [Tue, 5 Dec 2023 10:06:04 +0000 (11:06 +0100)] 
gm2: Fix mc/mc.flex compilation on Solaris

The recent warning changes broke gm2 bootstrap on Solaris:

/vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex: In function 'handleFile':
/vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex:297:21: error: implicit
declaration of function 'alloca' [-Wimplicit-function-declaration]
  297 |   char *s = (char *)alloca (strlen (filename) + 2 + 1);
      |                     ^~~~~~

alloca needs <alloca.h> on Solaris, which isn't universally available.
Since mc.flex doesn't include any config header, I chose to switch to
__builtin_alloca instead.

/vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex:332:19: error: implicit
declaration of function 'index' [-Wimplicit-function-declaration]
  332 |   char   *p     = index(sdate, '\n');
      |                   ^~~~~

index is declared in <strings.h> on Solaris, again not a standard
header.  I simply switched to using strchr to avoid that issue.

Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, x86_64-pc-linux-gnu, and
x86_64-apple-darwin23.1.0.

2023-12-03  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/m2:
* mc/mc.flex [__GNUC__]: Define alloca as __builtin_alloca.
(handleDate): Use strchr instead of index.

5 months agolibiberty: Fix pex_unix_wait return type
Rainer Orth [Tue, 5 Dec 2023 10:04:06 +0000 (11:04 +0100)] 
libiberty: Fix pex_unix_wait return type

The recent warning patches broke Solaris bootstrap:

/vol/gcc/src/hg/master/local/libiberty/pex-unix.c:326:3: error: initialization of 'pid_t (*)(struct pex_obj *, pid_t,  int *, struct pex_time *, int,  const char **, int *)' {aka 'long int (*)(struct pex_obj *, long int,  int *, struct pex_time *, int,  const char **, int *)'} from incompatible pointer type 'int (*)(struct pex_obj *, pid_t,  int *, struct pex_time *, int,  const char **, int *)' {aka 'int (*)(struct pex_obj *, long int,  int *, struct pex_time *, int,  const char **, int *)'} [-Wincompatible-pointer-types]
  326 |   pex_unix_wait,
      |   ^~~~~~~~~~~~~
/vol/gcc/src/hg/master/local/libiberty/pex-unix.c:326:3: note: (near initialization for 'funcs.wait')

While pex_funcs.wait expects a function returning pid_t, pex_unix_wait
currently returns int.  However, on Solaris pid_t is long for 32-bit,
but int for 64-bit.

This patches fixes this by having pex_unix_wait return pid_t as
expected, and like every other variant already does.

Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, x86_64-pc-linux-gnu, and
x86_64-apple-darwin23.1.0.

2023-12-03  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

libiberty:
* pex-unix.c (pex_unix_wait): Change return type to pid_t.

5 months agoAllow targets to add USEs to asms
Richard Sandiford [Tue, 5 Dec 2023 09:52:41 +0000 (09:52 +0000)] 
Allow targets to add USEs to asms

Arm's SME has an array called ZA that for inline asm purposes
is effectively a form of special-purpose memory.  It doesn't
have an associated storage type and so can't be passed and
returned in normal C/C++ objects.

We'd therefore like "za" in a clobber list to mean that an inline
asm can read from and write to ZA.  (Just reading or writing
individually is unlikely to be useful, but we could add syntax
for that too if necessary.)

There is currently a TARGET_MD_ASM_ADJUST target hook that allows
targets to add clobbers to an asm instruction.  This patch
extends that to allow targets to add USEs as well.

gcc/
* target.def (md_asm_adjust): Add a uses parameter.
* doc/tm.texi: Regenerate.
* cfgexpand.cc (expand_asm_loc): Update call to md_asm_adjust.
Handle any USEs created by the target.
(expand_asm_stmt): Likewise.
* recog.cc (asm_noperands): Handle asms with USEs.
(decode_asm_operands): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add uses
parameter.
* config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise.
* config/arm/arm.cc (thumb1_md_asm_adjust): Likewise.
* config/avr/avr.cc (avr_md_asm_adjust): Likewise.
* config/cris/cris.cc (cris_md_asm_adjust): Likewise.
* config/i386/i386.cc (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.cc (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.cc (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.cc (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.cc (rs6000_md_asm_adjust): Likewise.
* config/s390/s390.cc (s390_md_asm_adjust): Likewise.
* config/vax/vax.cc (vax_md_asm_adjust): Likewise.
* config/visium/visium.cc (visium_md_asm_adjust): Likewise.

5 months agoAdd a new target hook: TARGET_START_CALL_ARGS
Richard Sandiford [Tue, 5 Dec 2023 09:44:52 +0000 (09:44 +0000)] 
Add a new target hook: TARGET_START_CALL_ARGS

We have the following two hooks into the call expansion code:

- TARGET_CALL_ARGS is called for each argument before arguments
  are moved into hard registers.

- TARGET_END_CALL_ARGS is called after the end of the call
  sequence (specifically, after any return value has been
  moved to a pseudo).

This patch adds a TARGET_START_CALL_ARGS hook that is called before
the TARGET_CALL_ARGS sequence.  This means that TARGET_START_CALL_REGS
and TARGET_END_CALL_REGS bracket the region in which argument registers
might be live.  They also bracket a region in which the only call
emiitted by target-independent code is the call to the target function
itself.  (For example, TARGET_START_CALL_ARGS happens after any use of
memcpy to copy arguments, and TARGET_END_CALL_ARGS happens before any
use of memcpy to copy the result.)

Also, the patch adds the cumulative argument structure as an argument
to the hooks, so that the target can use it to record and retrieve
information about the call as a whole.

The TARGET_CALL_ARGS docs said:

   While generating RTL for a function call, this target hook is invoked once
   for each argument passed to the function, either a register returned by
   ``TARGET_FUNCTION_ARG`` or a memory location.  It is called just
-  before the point where argument registers are stored.

The last bit was true for normal calls, but for libcalls the hook was
invoked earlier, before stack arguments have been copied.  I don't think
this caused a practical difference for nvptx (the only port to use the
hooks) since I wouldn't expect any libcalls to take stack parameters.

gcc/
* doc/tm.texi.in: Add TARGET_START_CALL_ARGS.
* doc/tm.texi: Regenerate.
* target.def (start_call_args): New hook.
(call_args, end_call_args): Add a parameter for the cumulative
argument information.
* hooks.h (hook_void_rtx_tree): Delete.
* hooks.cc (hook_void_rtx_tree): Likewise.
* targhooks.h (hook_void_CUMULATIVE_ARGS): Declare.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* targhooks.cc (hook_void_CUMULATIVE_ARGS): New function.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* calls.cc (expand_call): Call start_call_args before computing
and storing stack parameters.  Pass the cumulative argument
information to call_args and end_call_args.
(emit_library_call_value_1): Likewise.
* config/nvptx/nvptx.cc (nvptx_call_args): Add a cumulative
argument parameter.
(nvptx_end_call_args): Likewise.

5 months agoaarch64: fix eh_return-3.c test
Szabolcs Nagy [Mon, 4 Dec 2023 13:30:13 +0000 (13:30 +0000)] 
aarch64: fix eh_return-3.c test

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-3.c: Fix when retaa is available.

5 months agoAdd a target hook for sibcall epilogues
Richard Sandiford [Tue, 5 Dec 2023 09:35:57 +0000 (09:35 +0000)] 
Add a target hook for sibcall epilogues

Epilogues for sibling calls are generated using the
sibcall_epilogue pattern.  One disadvantage of this approach
is that the target doesn't know which call the epilogue is for,
even though the code that generates the pattern has the call
to hand.

Although call instructions are currently rtxes, and so could be
passed as an operand to the pattern, the main point of introducing
rtx_insn was to move towards separating the rtx and insn types
(a good thing IMO).  There also isn't an existing practice of
passing genuine instructions (as opposed to labels) to
instruction patterns.

This patch therefore adds a hook that can be defined as an
alternative to sibcall_epilogue.  The advantage is that it
can be passed the call; the disadvantage is that it can't
use .md conveniences like generating instructions from
textual patterns (although most epilogues are too complex
to benefit much from that anyway).

gcc/
* doc/tm.texi.in: Add TARGET_EMIT_EPILOGUE_FOR_SIBCALL.
* doc/tm.texi: Regenerate.
* target.def (emit_epilogue_for_sibcall): New hook.
* calls.cc (can_implement_as_sibling_call_p): Use it.
* function.cc (thread_prologue_and_epilogue_insns): Likewise.
(reposition_prologue_and_epilogue_notes): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_expand_epilogue): Take
an rtx_call_insn * rather than a bool.
* config/aarch64/aarch64.cc (aarch64_expand_epilogue): Likewise.
(TARGET_EMIT_EPILOGUE_FOR_SIBCALL): Define.
* config/aarch64/aarch64.md (epilogue): Update call.
(sibcall_epilogue): Delete.

5 months agoc: Turn -Wimplicit-function-declaration into a permerror: Fix 'gcc.dg/gnu23-builtins...
Thomas Schwinge [Fri, 1 Dec 2023 15:52:06 +0000 (16:52 +0100)] 
c: Turn -Wimplicit-function-declaration into a permerror: Fix 'gcc.dg/gnu23-builtins-no-dfp-1.c'

With recent commit 55e94561e97ed0bce4774aa1c6b5d5d82209a379
"c: Turn -Wimplicit-function-declaration into a permerror", this test
case, added in 2019 commit 5b8d9367684f266c30c280b4d3c98830a88c70ab
"Prevent all uses of DFP when unsupported (PR c/91985)" started FAILing
(for applicable configurations):

    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 13)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 14)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 15)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 16)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 17)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c  (test for warnings, line 18)
    [-PASS:-]{+FAIL:+} gcc.dg/gnu23-builtins-no-dfp-1.c (test for excess errors)

This is due to:

    [...]/gcc.dg/gnu23-builtins-no-dfp-1.c:13:13: error: implicit declaration of function '__builtin_fabsd32'; did you mean '__builtin_fabsf32'? [-Wimplicit-function-declaration]
    [...]

Specifying '-fpermissive', commit f37744662cbc74efcceb790b99dcd6521c51a578
"[committed] Fix gnu23-builtins-no-dfp" subsequently resolved the FAILs, but
patch review concluded that for this test case it's secondary *how*
"implicit declaration of function" is diagnosed, so we'd test the standard
way, which instead of "warning" now is "error".

gcc/testsuite/
* gcc.dg/gnu23-builtins-no-dfp-1.c: Remove '-fpermissive'.
'dg-error "implicit"' instead of 'dg-warning "implicit"'.

5 months agoAllow prologues and epilogues to be inserted later
Richard Sandiford [Tue, 5 Dec 2023 09:28:46 +0000 (09:28 +0000)] 
Allow prologues and epilogues to be inserted later

Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions.  It can also change the current vector length.

There are instructions to switch in and out of streaming mode.
However, their effect on the ISA and vector length can't be represented
directly in RTL, so they need to be emitted late in the pass pipeline,
close to md_reorg.

It's sometimes the responsibility of the prologue and epilogue to
switch modes, which means we need to emit the prologue and epilogue
sequences late as well.  (This loses shrink-wrapping and scheduling
opportunities, but that's a price worth paying.)

This patch therefore adds a target hook for forcing prologue
and epilogue insertion to happen later in the pipeline.

gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/tm.texi: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.

5 months agoRISC-V: Check if zcd conflicts with zcmt and zcmp
Kito Cheng [Mon, 27 Nov 2023 12:50:11 +0000 (20:50 +0800)] 
RISC-V: Check if zcd conflicts with zcmt and zcmp

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::check_conflict_ext): Check zcd conflicts
with zcmt and zcmp.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-29.c: New test.
* gcc.target/riscv/arch-30.c: New test.

5 months agolra: Updates of biggest mode for hard regs [PR112278]
Richard Sandiford [Tue, 5 Dec 2023 09:20:55 +0000 (09:20 +0000)] 
lra: Updates of biggest mode for hard regs [PR112278]

LRA keeps track of the biggest mode for both hard registers and
pseudos.  The updates assume that the modes are ordered, i.e. that
we can tell whether one is no bigger than the other at compile time.

That is (or at least seemed to be) a reasonable restriction for pseudos.
But it isn't necessarily so for hard registers, since the uses of hard
registers can be logically distinct.  The testcase is an example of this.

The biggest mode of hard registers is also special for other reasons.
As the existing comment says:

  /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as
     part of a multi-word register.  In that case, just use the reg_rtx
     mode.  Do the same also if the biggest mode was larger than a register
     or we can not compare the modes.  Otherwise, limit the size to that of
     the biggest access in the function or to the natural mode at least.  */

This patch applies the same approach to the updates.

gcc/
PR rtl-optimization/112278
* lra-int.h (lra_update_biggest_mode): New function.
* lra-coalesce.cc (merge_pseudos): Use it.
* lra-lives.cc (process_bb_lives): Likewise.
* lra.cc (new_insn_reg): Likewise.

gcc/testsuite/
PR rtl-optimization/112278
* gcc.target/aarch64/sve/pr112278.c: New test.

5 months agolower-bitint: Make temporarily wrong IL less wrong [PR112843]
Jakub Jelinek [Tue, 5 Dec 2023 08:45:40 +0000 (09:45 +0100)] 
lower-bitint: Make temporarily wrong IL less wrong [PR112843]

As discussed in the PR, for the middle (on x86-64 65..128 bit) _BitInt
types like
  _1 = x_4(D) * 5;
where _1 and x_4(D) have _BitInt(128) type and x is PARM_DECL, the bitint
lowering pass wants to replace this with
  _13 = (int128_t) x_4(D);
  _12 = _13 * 5;
  _1 = (_BitInt(128)) _12;
where _13 and _12 have int128_t type and the ranger ICEs when the IL is
temporarily invalid:
during GIMPLE pass: bitintlower
pr112843.c: In function ‘foo’:
pr112843.c:7:1: internal compiler error: Segmentation fault
    7 | foo (_BitInt (128) x, _BitInt (256) y)
      | ^~~
0x152943f crash_signal
        ../../gcc/toplev.cc:316
0x25c21c8 ranger_cache::range_of_expr(vrange&, tree_node*, gimple*)
        ../../gcc/gimple-range-cache.cc:1204
0x25cdcf9 fold_using_range::range_of_range_op(vrange&, gimple_range_op_handler&, fur_source&)
        ../../gcc/gimple-range-fold.cc:671
0x25cf9a0 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*)
        ../../gcc/gimple-range-fold.cc:602
0x25b5520 gimple_ranger::update_stmt(gimple*)
        ../../gcc/gimple-range.cc:564
0x16f1234 update_stmt_operands(function*, gimple*)
        ../../gcc/tree-ssa-operands.cc:1150
0x117a5b6 update_stmt_if_modified(gimple*)
        ../../gcc/gimple-ssa.h:187
0x117a5b6 update_stmt_if_modified(gimple*)
        ../../gcc/gimple-ssa.h:184
0x117a5b6 update_modified_stmt
        ../../gcc/gimple-iterator.cc:44
0x117a5b6 gsi_insert_after(gimple_stmt_iterator*, gimple*, gsi_iterator_update)
        ../../gcc/gimple-iterator.cc:544
0x25abc2f gimple_lower_bitint
        ../../gcc/gimple-lower-bitint.cc:6348

What the code does right now is, it first creates a new SSA_NAME (_12
above), adds the
  _1 = (_BitInt(128)) _12;
stmt after it (where it crashes, because _12 has no SSA_NAME_DEF_STMT yet),
then sets lhs of the previous stmt to _12 (this is also temporarily
incorrect, there are incompatible types involved in the stmt), later on
changes also operands and finally update_stmt it.

The following patch instead changes the lhs of the stmt before adding the
cast after it.  The question is if this is less or more wrong temporarily
(but the ICE is gone).  In addition to that the patch moves the operand
adjustments before the lhs adjustment.

The reason I tweaked the lhs first is that it then just uses gimple_op and
iterates over all ops, if that is done before lhs it would need to special
case which op to skip because it is lhs (I'm using gimple_get_lhs for the
lhs, but this isn't done for GIMPLE_CALL nor GIMPLE_PHI, so GIMPLE_ASSIGN
or say GIMPLE_GOTO etc. are the only options).

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112843
* gimple-lower-bitint.cc (gimple_lower_bitint): Change lhs of stmt
to lhs2 before building and inserting lhs = (cast) lhs2; assignment.
Adjust stmt operands before adjusting lhs.

* gcc.dg/bitint-47.c: New test.

5 months agoRISC-V: FAIL:g++.dg/torture/vshuf-v[2|4]di.C -Os (execution test) on RV32
xuli [Tue, 5 Dec 2023 05:58:35 +0000 (05:58 +0000)] 
RISC-V: FAIL:g++.dg/torture/vshuf-v[2|4]di.C -Os (execution test) on RV32

This patch fixs the issue of g++.dg/torture/vshuf-v2di.C
and g++.dg/torture/vshuf-v4di.C -Os execution failure with
-march=rv32gcv -mabi=ilp32d.

Consider the following code:
typedef unsigned long long V __attribute__((vector_size(16)));

.LC0: 0xc1c2c3c4c5c6c7c8

before this patch:

lui a5,%hi(.LC0)
addi a5,a5,%lo(.LC0)
lw a6,4(a5)//0xc1c2c3c4
lw a5,0(a5)//0xc5c6c7c8
vsetivli zero,2,e64,m1,ta,mu
vmv.v.x v2,a5//v2 is {0xffffffffc5c6c7c8, 0xffffffffc5c6c7c8}

after this patch:

lui a5,%hi(.LC0)
addi a5,a5,%lo(.LC0)
vsetivli zero,2,e64,m1,ta,mu
vlse64.v v2,0(a5),zero//v2 is {0xc1c2c3c4c5c6c7c8, 0xc1c2c3c4c5c6c7c8}

gcc/ChangeLog:

* config/riscv/riscv-v.cc (sew64_scalar_helper): Bugfix.

5 months agoi386: Improve code generation for vector __builtin_signbit (x.x[i]) ? -1 : 0 [PR112816]
Jakub Jelinek [Tue, 5 Dec 2023 08:08:45 +0000 (09:08 +0100)] 
i386: Improve code generation for vector __builtin_signbit (x.x[i]) ? -1 : 0 [PR112816]

On the testcase I've recently fixed I've noticed bad code generation,
we emit
        pxor    %xmm1, %xmm1
        psrld   $31, %xmm0
        pcmpeqd %xmm1, %xmm0
        pcmpeqd %xmm1, %xmm0
or
        vpxor   %xmm1, %xmm1, %xmm1
        vpsrld  $31, %xmm0, %xmm0
        vpcmpeqd        %xmm1, %xmm0, %xmm0
        vpcmpeqd        %xmm1, %xmm0, %xmm2
rather than
        psrad   $31, %xmm2
or
        vpsrad  $31, %xmm1, %xmm2
The following patch fixes that using a combiner splitter.

2023-12-05  Jakub Jelinek  <jakub@redhat.com>

PR target/112816
* config/i386/sse.md ((eq (eq (lshiftrt x elt_bits-1) 0) 0)): New
splitter to turn psrld $31; pcmpeq; pcmpeq into psrad $31.

* gcc.target/i386/pr112816.c: New test.

5 months agoRISC-V: Add blocker for gather/scatter auto-vectorization
Juzhe-Zhong [Tue, 5 Dec 2023 03:22:50 +0000 (11:22 +0800)] 
RISC-V: Add blocker for gather/scatter auto-vectorization

This patch fixes ICE exposed on full coverage testing:

                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)
                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic --param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)
                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m4 ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)
                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m4 --param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)
                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m8 ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)
                                === g++: Unexpected fails for rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at machmode.h:313)

The rootcause is we can't extend RVVM4SImode into RVVM8DImode on zve32f.
Add a blocker of it to disable such auto-vectorization in this situation.

gcc/ChangeLog:

* config/riscv/autovec.md: Add blocker.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New function.
* config/riscv/riscv-v.cc (gather_scatter_valid_offset_p): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-2.C: New test.

5 months agoc/89270 - honor registered_builtin_types in type_for_size
Richard Biener [Mon, 4 Dec 2023 13:03:37 +0000 (14:03 +0100)] 
c/89270 - honor registered_builtin_types in type_for_size

The following fixes the intermediate conversions inserted by
convert_to_integer when facing address-spaces and converts
to their effective [u]intptr_t when they are registered_builtin_types
by considering those also from c_common_type_for_size and not
only from c_common_type_for_mode.

PR c/89270
gcc/c-family/
* c-common.cc (c_common_type_for_size): Consider
registered_builtin_types.

gcc/testsuite/
* gcc.target/avr/pr89270.c: New testcase.

5 months agoc/86869 - preserve address-space info when building qualified ARRAY_TYPE
Richard Biener [Mon, 4 Dec 2023 12:31:35 +0000 (13:31 +0100)] 
c/86869 - preserve address-space info when building qualified ARRAY_TYPE

The following adjusts the C FE specific qualified type building
to preserve address-space info also for ARRAY_TYPE.

PR c/86869
gcc/c/
* c-typeck.cc (c_build_qualified_type): Preserve address-space
info for ARRAY_TYPE.

gcc/testsuite/
* gcc.target/avr/pr86869.c: New testcase.

5 months agotree-optimization/112827 - more SCEV cprop fixes
Richard Biener [Mon, 4 Dec 2023 14:46:38 +0000 (15:46 +0100)] 
tree-optimization/112827 - more SCEV cprop fixes

The insert iteration can be corrupted by foldings of replace_uses_by,
within this particular PHI replacement but also with subsequent ones.
Recompute the insert location before insertion instead.

This fixes an obvserved ICE of gcc.dg/tree-ssa/ssa-sink-16.c.

PR tree-optimization/112827
PR tree-optimization/112848
* tree-scalar-evolution.cc (final_value_replacement_loop):
Compute the insert location for each insert.

5 months agoTake register pressure into account for vec_construct/scalar_to_vec when the componen...
liuhongt [Mon, 27 Nov 2023 05:35:41 +0000 (13:35 +0800)] 
Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

For vec_contruct, the components must be live at the same time if
they're not loaded from memory, when the number of those components
exceeds available registers, spill happens. Try to account that with a
rough estimation.
??? Ideally, we should have an overall estimation of register pressure
if we know the live range of all variables.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Count sse_reg/gpr_regs for components not loaded from memory.
(ix86_vector_costs:ix86_vector_costs): New constructor.
(ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
(ix86_vector_costs::m_num_sse_needed[3]): Ditto.
(ix86_vector_costs::finish_cost): Estimate overall register
pressure cost.
(ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
function.

5 months agoSupport udot_prodv*qi with emulation sdot_prodv*hi
liuhongt [Mon, 4 Dec 2023 03:47:32 +0000 (11:47 +0800)] 
Support udot_prodv*qi with emulation sdot_prodv*hi

Like r14-5990-gb4a7c1c8c59d19, but the patch optimized for udot_prod.

Since (zero_extend) (unsigned char)-> int is equal
to (zero_extend)(unsigned char) -> short
+ (sign_extend) (short) -> int

It should be safe to emulate udot_prodv*qi with

     vec_unpacku_lo_v32qi
     vec_unpacku_lo_v32qi
     vec_unpacku_hi_v32qi
     vec_unpacku_hi_v32qi
     sdot_prodv16hi
     sdot_prodv16hi
     add3v8si

gcc/ChangeLog:

* config/i386/sse.md (udot_prodv64qi): New expander.
(udot_prod<mode>): Emulates with VEC_UNPACKU_EXPR +
DOT_PROD (short, int).

gcc/testsuite/ChangeLog:

* gcc.target/i386/udotprodint8_emulate.c: New test.

5 months agoc++: implement P2564, consteval needs to propagate up [PR107687]
Marek Polacek [Tue, 19 Sep 2023 20:31:17 +0000 (16:31 -0400)] 
c++: implement P2564, consteval needs to propagate up [PR107687]

This patch implements P2564, described at <wg21.link/p2564>, whereby
certain functions are promoted to consteval.  For example:

  consteval int id(int i) { return i; }

  template <typename T>
  constexpr int f(T t)
  {
    return t + id(t); // id causes f<int> to be promoted to consteval
  }

  void g(int i)
  {
    f (3);
  }

now compiles.  Previously the code was ill-formed: we would complain
that 't' in 'f' is not a constant expression.  Since 'f' is now
consteval, it means that the call to id(t) is in an immediate context,
so doesn't have to produce a constant -- this is how we allow consteval
functions composition.  But making 'f<int>' consteval also means that
the call to 'f' in 'g' must yield a constant; failure to do so results
in an error.  I made the effort to have cc1plus explain to us what's
going on.  For example, calling f(i) produces this neat diagnostic:

w.C:11:11: error: call to consteval function 'f<int>(i)' is not a constant expression
   11 |         f (i);
      |         ~~^~~
w.C:11:11: error: 'i' is not a constant expression
w.C:6:22: note: 'constexpr int f(T) [with T = int]' was promoted to an immediate function because its body contains an immediate-escalating expression 'id(t)'
    6 |         return t + id(t); // id causes f<int> to be promoted to consteval
      |                    ~~^~~

which hopefully makes it clear what's going on.

Implementing this proposal has been tricky.  One problem was delayed
instantiation: instantiating a function can set off a domino effect
where one call promotes a function to consteval but that then means
that another function should also be promoted, etc.

In v1, I addressed the delayed instantiation problem by instantiating
trees early, so that we can escalate functions right away.  That caused
a number of problems, and in certain cases, like consteval-prop3.C, it
can't work, because we need to wait till EOF to see the definition of
the function anyway.  Overeager instantiation tends to cause diagnostic
problems too.

In v2, I attempted to move the escalation to the gimplifier, at which
point all templates have been instantiated.  That attempt flopped,
however, because once we've gimplified a function, its body is discarded
and as a consequence, you can no longer evaluate a call to that function
which is required for escalating, which needs to decide if a call is
a constant expression or not.

Therefore, we have to perform the escalation before gimplifying, but
after instantiate_pending_templates.  That's not easy because we have
no way to walk all the trees.  In the v2 patch, I use two vectors: one
to store function decls that may become consteval, and another to
remember references to immediate-escalating functions.  Unfortunately
the latter must also stash functions that call immediate-escalating
functions.  Consider:

  int g(int i)
  {
    f<int>(i); // f is immediate-escalating
  }

where g itself is not immediate-escalating, but we have to make sure
that if f gets promoted to consteval, we give an error.

A new option, -fno-immediate-escalation, is provided to suppress
escalating functions.

v2 also adds a new flag, DECL_ESCALATION_CHECKED_P, so that we don't
escalate a function multiple times, and so that we can distinguish between
explicitly consteval functions and functions that have been promoted
to consteval.

In v3, I removed one of the new vectors and changed the other one
to a hash set.  This version also contains numerous cleanups.

v4 merges find_escalating_expr_r into cp_fold_immediate_r.  It also
adds a new optimization in cp_fold_function.

v5 greatly simplifies the code.

v6 simplifies the code further and removes an ff_ flag.

v7 removes maybe_promote_function_to_consteval and further simplifies
cp_fold_immediate_r logic.

v8 removes maybe_store_immediate_escalating_fn.

PR c++/107687
PR c++/110997

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update __cpp_consteval.
* c-opts.cc (c_common_post_options): Pre-C++20, unset
flag_immediate_escalation.
* c.opt (fimmediate-escalation): New option.

gcc/cp/ChangeLog:

* call.cc (in_immediate_context): No longer static.
* constexpr.cc (cxx_eval_call_expression): Adjust assert.
* cp-gimplify.cc (deferred_escalating_exprs): New vec.
(remember_escalating_expr): New.
(enum fold_flags): Remove ff_fold_immediate.
(immediate_escalating_function_p): New.
(unchecked_immediate_escalating_function_p): New.
(promote_function_to_consteval): New.
(cp_fold_immediate): Move above.  Return non-null if any errors were
emitted.
(maybe_explain_promoted_consteval): New.
(cp_gimplify_expr) <case CALL_EXPR>: Assert we've handled all
immediate invocations.
(taking_address_of_imm_fn_error): New.
(cp_fold_immediate_r): Merge ADDR_EXPR and PTRMEM_CST cases.  Implement
P2564 - promoting functions to consteval.
<case CALL_EXPR>: Implement P2564 - promoting functions to consteval.
(cp_fold_r): If an expression turns into a CALL_EXPR after cp_fold,
call cp_fold_immediate_r on the CALL_EXPR.
(cp_fold_function): Set DECL_ESCALATION_CHECKED_P if
deferred_escalating_exprs does not contain current_function_decl.
(process_and_check_pending_immediate_escalating_fns): New.
* cp-tree.h (struct lang_decl_fn): Add escalated_p bit-field.
(DECL_ESCALATION_CHECKED_P): New.
(immediate_invocation_p): Declare.
(process_pending_immediate_escalating_fns): Likewise.
* decl2.cc (c_parse_final_cleanups): Set at_eof to 2 after all
templates have been instantiated; and to 3 at the end of the function.
Call process_pending_immediate_escalating_fns.
* error.cc (dump_template_bindings): Check at_eof against an updated
value.
* module.cc (trees_out::lang_decl_bools): Stream escalated_p.
(trees_in::lang_decl_bools): Likewise.
* pt.cc (push_tinst_level_loc): Set at_eof to 3, not 2.
* typeck.cc (cp_build_addr_expr_1): Don't check
DECL_IMMEDIATE_FUNCTION_P.

gcc/ChangeLog:

* doc/invoke.texi: Document -fno-immediate-escalation.

libstdc++-v3/ChangeLog:

* testsuite/18_support/comparisons/categories/zero_neg.cc: Add
dg-prune-output.
* testsuite/std/format/string_neg.cc: Add dg-error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/consteval-if10.C: Remove dg-error.
* g++.dg/cpp23/consteval-if2.C: Likewise.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected value of __cpp_consteval.
* g++.dg/cpp26/feat-cxx26.C: Likewise.
* g++.dg/cpp2a/consteval-memfn1.C: Add dg-error.
* g++.dg/cpp2a/consteval11.C: Likewise.
* g++.dg/cpp2a/consteval3.C: Adjust dg-error.
* g++.dg/cpp2a/consteval34.C: Add dg-error.
* g++.dg/cpp2a/consteval36.C: Likewise.
* g++.dg/cpp2a/consteval9.C: Likewise.
* g++.dg/cpp2a/feat-cxx2a.C: Adjust expected value of __cpp_consteval.
* g++.dg/cpp2a/spaceship-synth9.C: Adjust dg-error.
* g++.dg/cpp2a/consteval-prop1.C: New test.
* g++.dg/cpp2a/consteval-prop10.C: New test.
* g++.dg/cpp2a/consteval-prop11.C: New test.
* g++.dg/cpp2a/consteval-prop12.C: New test.
* g++.dg/cpp2a/consteval-prop13.C: New test.
* g++.dg/cpp2a/consteval-prop14.C: New test.
* g++.dg/cpp2a/consteval-prop15.C: New test.
* g++.dg/cpp2a/consteval-prop16.C: New test.
* g++.dg/cpp2a/consteval-prop17.C: New test.
* g++.dg/cpp2a/consteval-prop18.C: New test.
* g++.dg/cpp2a/consteval-prop19.C: New test.
* g++.dg/cpp2a/consteval-prop20.C: New test.
* g++.dg/cpp2a/consteval-prop2.C: New test.
* g++.dg/cpp2a/consteval-prop3.C: New test.
* g++.dg/cpp2a/consteval-prop4.C: New test.
* g++.dg/cpp2a/consteval-prop5.C: New test.
* g++.dg/cpp2a/consteval-prop6.C: New test.
* g++.dg/cpp2a/consteval-prop7.C: New test.
* g++.dg/cpp2a/consteval-prop8.C: New test.
* g++.dg/cpp2a/consteval-prop9.C: New test.

5 months agoDaily bump.
GCC Administrator [Tue, 5 Dec 2023 00:17:20 +0000 (00:17 +0000)] 
Daily bump.

5 months agoc++: fix constexpr noreturn diagnostic
Jason Merrill [Mon, 4 Dec 2023 22:42:13 +0000 (17:42 -0500)] 
c++: fix constexpr noreturn diagnostic

Mentioning a noreturn function does not involve an lvalue-rvalue
conversion.

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Fix
check for loading volatile lvalue.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-noreturn1.C: New test.

5 months agoMATCH: Fix zero_one_valued_p's convert pattern
Andrew Pinski [Sun, 12 Nov 2023 04:33:28 +0000 (20:33 -0800)] 
MATCH: Fix zero_one_valued_p's convert pattern

While working on PR 111972, I was getting a regression
due to zero_one_valued_p matching a signed 1 bit integer
when it came to convert. This patch fixes that by checking
the outer type too.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (zero_one_valued_p): For convert
make sure type is not a signed 1-bit integer.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
5 months ago[committed] Fix HImode load mnemonic on microblaze port
Jeff Law [Mon, 4 Dec 2023 17:06:49 +0000 (10:06 -0700)] 
[committed] Fix HImode load mnemonic on microblaze port

The tester recently started failing va-arg-22.c on microblaze-linux:

gcc.c-torture/execute/va-arg-22.c   -O0  (test for excess errors)

It was failing with an undefined reference to "r7" at link time.  This was
ultimately tracked down to a HImode load using (reg+reg) addressing mode, but
which used the lhui instruction instead of lhu.  The "i" means it's supposed to
be (reg+disp) so the assembler tried to interpret "r7" as an immediate/symbol.

The port uses %i<opnum> as an output modifier to select between sh/shi and
various other mnemonics for loads/stores.  The movhi pattern simply failed to
use it for the two cases where it's loading from memory (interestingly enough
it was used for stores).

Clearly we aren't using reg+reg much for HImode loads as this didn't fix
anything else in the testsuite.

gcc/
* config/microblaze/microblaze.md (movhi): Use %i for half-word
loads to properly select between lhu/lhui.

5 months agoRISC-V: testsuite: Remove redundant vector_hw and zvfh_hw.
Robin Dapp [Tue, 21 Nov 2023 12:31:05 +0000 (13:31 +0100)] 
RISC-V: testsuite: Remove redundant vector_hw and zvfh_hw.

This replaces the now-redundant vector_hw and zvfh_hw checks in the
testsuite by riscv_v and riscv_zvfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c:
Replace riscv_zvfh_hw with riscv_zvfh.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c:
Ditto.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: Allow
overriding N.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: Replace
riscv zvfh_hw with riscv_zvfh.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-8.c: Ditto.
* lib/target-supports.exp: Remove riscv_vector_hw and
riscv_zvfh_hw.

5 months agoRISC-V: Fix two testscases related to -std changes.
Robin Dapp [Mon, 4 Dec 2023 12:22:18 +0000 (13:22 +0100)] 
RISC-V: Fix two testscases related to -std changes.

Recent -std changes caused testsuite failures.  Fix those by adding
-std=gnu99 and -Wno-incompatible-pointer-types.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112552.c: Add
-Wno-incompatible-pointer-types.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c:
Add -std=gnu99.

5 months agoRISC-V: Fix rawmemchr implementation.
Robin Dapp [Fri, 1 Dec 2023 08:45:29 +0000 (09:45 +0100)] 
RISC-V: Fix rawmemchr implementation.

This fixes a bug in the rawmemchr implementation by incrementing the
source address by vl * element_size instead of just vl.

This is normally harmless as we will just scan the same region more than
once but, in combination with an older qemu version, will lead to
an execution failure in SPEC2017.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_rawmemchr): Increment
source address by vl * element_size.

5 months agoRISC-V: Rename and unify stringop strategy handling.
Robin Dapp [Fri, 1 Dec 2023 08:30:17 +0000 (09:30 +0100)] 
RISC-V: Rename and unify stringop strategy handling.

In preparation for the vectorized strlen and strcmp support this NFC
patch unifies the stringop strategy handling a bit.  The "auto"
strategy now is a combination of scalar and vector and an expander
should try the strategies in their preferred order.

For the block_move expander this patch does just that.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename...
(enum stringop_strategy_enum): ... to this.
* config/riscv/riscv-string.cc (riscv_expand_block_move): New
wrapper expander handling the strategies and delegation.
(riscv_expand_block_move_scalar): Rename function and make
static.
(expand_block_move): Remove strategy handling.
* config/riscv/riscv.md: Call expander wrapper.
* config/riscv/riscv.opt: Rename.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-strategy-1.c: Change to
-mstringop-strategy.
* gcc.target/riscv/rvv/base/cpymem-strategy-2.c: Ditto.
* gcc.target/riscv/rvv/base/cpymem-strategy-3.c: Ditto.
* gcc.target/riscv/rvv/base/cpymem-strategy-4.c: Ditto.
* gcc.target/riscv/rvv/base/cpymem-strategy-5.c: Ditto.

5 months agomiddle-end/112785 - guard against last_clique overflow
Richard Biener [Mon, 4 Dec 2023 13:50:59 +0000 (14:50 +0100)] 
middle-end/112785 - guard against last_clique overflow

The PR shows that we'll ICE eventually when last_clique wraps.  The
following avoids this by refusing to hand out new cliques after
exhausting them.  We then use zero (no clique) as conservative
fallback.

PR middle-end/112785
* function.h (get_new_clique): New inline function handling
last_clique overflow.
* cfgrtl.cc (duplicate_insn_chain): Use it.
* tree-cfg.cc (gimple_duplicate_bb): Likewise.
* tree-inline.cc (remap_dependence_clique): Likewise.

5 months agoRISC-V: Document optimization parameter riscv-strcmp-inline-limit
Christoph Müllner [Sat, 2 Dec 2023 20:56:57 +0000 (21:56 +0100)] 
RISC-V: Document optimization parameter riscv-strcmp-inline-limit

This patch documents the optimization parameter
riscv-strcmp-inline-limit, which can be used to tweak the behaviour
of -minline-strcmp and -minline-strncmp.

gcc/ChangeLog:

PR target/112650
* doc/invoke.texi: Document riscv-strcmp-inline-limit.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
5 months agoRISC-V: Fix overlap group incorrect overlap on v0
Juzhe-Zhong [Mon, 4 Dec 2023 13:44:56 +0000 (21:44 +0800)] 
RISC-V: Fix overlap group incorrect overlap on v0

In serious high register pressure case (appended in this patch):

We see vluxei8.v       v0,(s1),v1,v0.t which is not allowed.
Since according to RVV ISA:

+;; The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0),
+;; unless the destination vector register is being written with a mask value (e.g., compares) or the scalar result of a reduction.

Such case doesn't have spillings, however, we expect such case should be spilled and reload data.

The rootcause is I made a mistake in previous patch on matching dest operand and mask operand constraints:

dest: "=vr"
mask: "vmWc1"

After this patch:

dest: "vd,vr"
mask: "vm,Wc1"

make EEW widening pattern are same as other instruction patterns.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Fix incorrect overlap in v0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-34.c: New test.

5 months agoRISC-V: Support highest-number regno overlap for widen ternary
Juzhe-Zhong [Mon, 4 Dec 2023 13:32:06 +0000 (21:32 +0800)] 
RISC-V: Support highest-number regno overlap for widen ternary

Consider this example:

#include "riscv_vector.h"
void
foo6 (void *in, void *out)
{
  vfloat64m8_t accum = __riscv_vle64_v_f64m8 (in, 4);
  vfloat64m4_t high_eew64 = __riscv_vget_v_f64m8_f64m4 (accum, 1);
  vint64m4_t high_eew64_i = __riscv_vreinterpret_v_f64m4_i64m4 (high_eew64);
  vint32m4_t high_eew32_i = __riscv_vreinterpret_v_i64m4_i32m4 (high_eew64_i);
  vfloat32m4_t high_eew32 = __riscv_vreinterpret_v_i32m4_f32m4 (high_eew32_i);
  vfloat64m8_t result = __riscv_vfwnmsac_vf_f64m8 (accum, 64, high_eew32, 4);
  __riscv_vse64_v_f64m8 (out, result, 4);
}

Before this patch:

foo6:                                   # @foo6
        vsetivli        zero, 4, e32, m4, ta, ma
        vle64.v v8, (a0)
        lui     a0, 272384
        fmv.w.x fa5, a0
        vmv8r.v v16, v8
        vfwnmsac.vf     v16, fa5, v12
        vse64.v v16, (a1)
        ret

After this patch:

foo6:
.LFB5:
.cfi_startproc
lui a5,%hi(.LC0)
flw fa5,%lo(.LC0)(a5)
vsetivli zero,4,e32,m4,ta,ma
vle64.v v8,0(a0)
vfwnmsac.vf v8,fa5,v12
vse64.v v8,0(a1)
ret

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add highest-number overlap support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-37.c: New test.
* gcc.target/riscv/rvv/base/pr112431-38.c: New test.

5 months agotree-optimization/112818 - re-instantiate vector type size check for bswap
Richard Biener [Mon, 4 Dec 2023 11:50:36 +0000 (12:50 +0100)] 
tree-optimization/112818 - re-instantiate vector type size check for bswap

For __builtin_bswap vectorization we still require an equal vector
type size.  Re-instantiate that check.

PR tree-optimization/112818
* tree-vect-stmts.cc (vectorizable_bswap): Check input and
output vector types have the same size.

* gcc.dg/vect/pr112818.c: New testcase.

5 months agoRISC-V: Rename bug-01.C to bug-1.C
Juzhe-Zhong [Mon, 4 Dec 2023 11:57:21 +0000 (19:57 +0800)] 
RISC-V: Rename bug-01.C to bug-1.C

Rename test to make RVV tests consistent, prepare for the following patches.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-01.C: Moved to...
* g++.target/riscv/rvv/autovec/bug-1.C: ...here.

5 months agotree-optimization/112827 - corrupt SCEV cache during SCCP
Richard Biener [Mon, 4 Dec 2023 09:46:11 +0000 (10:46 +0100)] 
tree-optimization/112827 - corrupt SCEV cache during SCCP

The following avoids corrupting the SCEV cache by my last change
to propagate constant final values immediately.  The easiest fix
is to keep a dead initialization around.

PR tree-optimization/112827
* tree-scalar-evolution.cc (final_value_replacement_loop):
Do not release SSA name but keep a dead initialization around.

* gcc.dg/torture/pr112827-1.c: New testcase.
* gcc.dg/torture/pr112827-2.c: Likewise.

5 months agoRISC-V: Remove earlyclobber from widen reduction
Juzhe-Zhong [Mon, 4 Dec 2023 08:51:06 +0000 (16:51 +0800)] 
RISC-V: Remove earlyclobber from widen reduction

Since the destination of reduction is not a vector register group, there
is no need to apply overlap constraint.

Also confirm Clang:

The mir in LLVM has early clobber:
early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed %17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24

The mir in LLVM doesn't have early clobber:
%48:vr = PseudoVWREDSUM_VS_M2_E8 $noreg(tied-def 0), %17:vrm2, killed %33:vr, %0:gprnox0, 3, 1; example.c:60:26

And also confirm both:

vwredsum.vs     v24, v8, v24 and vwredsum.vs     v8, v8, v24 all legal on LLVM.

Align with LLVM and honor RISC-V V spec, remove earlyclobber.

Before this patch:

vwredsum.vs     v8,v24,v8
        vwredsum.vs     v7,v22,v7
        vwredsum.vs     v6,v20,v6
        vwredsum.vs     v5,v18,v5
        vwredsum.vs     v4,v16,v4
        vwredsum.vs     v3,v14,v3
        vwredsum.vs     v2,v12,v2
        vwredsum.vs     v1,v10,v1
        vmv1r.v v9,v8
        vwredsum.vs     v9,v24,v9
        vmv1r.v v24,v7
        vwredsum.vs     v24,v22,v24
        vmv1r.v v22,v6
        vwredsum.vs     v22,v20,v22
        vmv1r.v v20,v5
        vwredsum.vs     v20,v18,v20
        vmv1r.v v18,v4
        vwredsum.vs     v18,v16,v18
        vmv1r.v v16,v3
        vwredsum.vs     v16,v14,v16
        vmv1r.v v14,v2
        vwredsum.vs     v14,v12,v14
        vmv1r.v v12,v1
        vwredsum.vs     v12,v10,v12

After this patch:

vfwredusum.vs v17,v12,v17
vfwredusum.vs v18,v10,v18
vfwredusum.vs v15,v26,v15
vfwredusum.vs v16,v24,v16
vfwredusum.vs v12,v12,v17
vfwredusum.vs v10,v10,v18
vfwredusum.vs v13,v6,v20
vfwredusum.vs v11,v8,v19
vfwredusum.vs v6,v6,v13
vfwredusum.vs v8,v8,v11
vfwredusum.vs v7,v4,v21
vfwredusum.vs v9,v2,v22
vfwredusum.vs v14,v26,v15
vfwredusum.vs v1,v24,v16
vfwredusum.vs v4,v4,v7
vfwredusum.vs v2,v2,v9

Same behavior as LLVM, and honor RISC-V V spec.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Remove earlyclobber from widen reduction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-35.c: New test.
* gcc.target/riscv/rvv/base/pr112431-36.c: New test.

5 months agoBTF: fix PR debug/112656
Indu Bhagat [Mon, 4 Dec 2023 09:57:34 +0000 (01:57 -0800)] 
BTF: fix PR debug/112656

PR debug/112656 - btf: function prototypes generated with name

With this patch, all BTF_KIND_FUNC_PROTO will appear anonymous in the
generated BTF section.

As noted in the discussion in the bugzilla, the number of
BTF_KIND_FUNC_PROTO types output varies across targets (BPF with -mco-re
vs non-BPF targets).  Hence the check in the test case merely checks
that all BTF_KIND_FUNC_PROTO appear anonymous.

gcc/ChangeLog:

PR debug/112656
* btfout.cc (btf_asm_type): Fixup ctti_name for all
BTF types of kind BTF_KIND_FUNC_PROTO.

gcc/testsuite/ChangeLog:

PR debug/112656
* gcc.dg/debug/btf/btf-function-7.c: New test.

5 months agoBTF: fix PR debug/112768
Indu Bhagat [Mon, 4 Dec 2023 09:57:25 +0000 (01:57 -0800)] 
BTF: fix PR debug/112768

PR debug/112768 - btf: fix asm comment output for BTF_KIND_FUNC* kinds

The patch adds a small function to abstract out the detail and return
the name of the type.  The patch also fixes the issue of BTF_KIND_FUNC
appearing in the comments with a 'null' string.

For btf-function-6.c testcase, after the patch:

        .long   0       # TYPE 2 BTF_KIND_FUNC_PROTO ''
        .long   0xd000002       # btt_info: kind=13, kflag=0, vlen=2
        .long   0x1     # btt_type: (BTF_KIND_INT 'int')
        .long   0       # farg_name
        .long   0x1     # farg_type: (BTF_KIND_INT 'int')
        .long   0       # farg_name
        .long   0x1     # farg_type: (BTF_KIND_INT 'int')
        .long   0       # TYPE 3 BTF_KIND_FUNC_PROTO ''
        .long   0xd000001       # btt_info: kind=13, kflag=0, vlen=1
        .long   0x1     # btt_type: (BTF_KIND_INT 'int')
        .long   0x68    # farg_name
        .long   0x1     # farg_type: (BTF_KIND_INT 'int')
        .long   0x5     # TYPE 4 BTF_KIND_FUNC 'extfunc'
        .long   0xc000002       # btt_info: kind=12, kflag=0, linkage=2
        .long   0x2     # btt_type: (BTF_KIND_FUNC_PROTO '')
        .long   0xd     # TYPE 5 BTF_KIND_FUNC 'foo'
        .long   0xc000001       # btt_info: kind=12, kflag=0, linkage=1
        .long   0x3     # btt_type: (BTF_KIND_FUNC_PROTO '')

gcc/ChangeLog:

PR debug/112768
* btfout.cc (get_btf_type_name): New definition.
(btf_collect_datasec): Update dtd_name to the original type name
string.
(btf_asm_type_ref): Use the new get_btf_type_name function
instead.
(btf_asm_type): Likewise.
(btf_asm_func_type): Likewise.

gcc/testsuite/ChangeLog:

PR debug/112768
* gcc.dg/debug/btf/btf-function-6.c: Empty string expected with
BTF_KIND_FUNC_PROTO.

5 months agoRISC-V: Add test case for bug PR112813
Pan Li [Mon, 4 Dec 2023 08:06:14 +0000 (16:06 +0800)] 
RISC-V: Add test case for bug PR112813

The bugzilla 112813 has been fixed recently, add below test
case for the bug.

PR target/112813

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112813-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
5 months agoi386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]
Jakub Jelinek [Mon, 4 Dec 2023 08:01:09 +0000 (09:01 +0100)] 
i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]

The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.

2023-12-04  Jakub Jelinek  <jakub@redhat.com>

PR target/112837
* config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
for UNSPEC_SET_GOT check that SET_SRC is UNSPEC.  Use SET_SRC and
SET_DEST macros instead of XEXP, rename vec variable to set.

* gcc.dg/pr112837.c: New test.

5 months agoi386: Fix up signbit<mode>2 expander [PR112816]
Jakub Jelinek [Mon, 4 Dec 2023 08:00:18 +0000 (09:00 +0100)] 
i386: Fix up signbit<mode>2 expander [PR112816]

The following testcase ICEs, because the signbit<mode>2 expander uses an
explicit SUBREG in the pattern around match_operand with register_operand
predicate.  If we are unlucky enough that expansion tries to expand it
with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
which is not valid and causes ICE later.

2023-12-04  Jakub Jelinek  <jakub@redhat.com>

PR target/112816
* config/i386/sse.md (signbit<mode>2): Force operands[1] into a REG.

* gcc.target/i386/sse2-pr112816.c: New test.

5 months agoc++: #pragma GCC unroll C++ fixes [PR112795]
Jakub Jelinek [Mon, 4 Dec 2023 07:59:15 +0000 (08:59 +0100)] 
c++: #pragma GCC unroll C++ fixes [PR112795]

foo in the unroll-5.C testcase ICEs because cp_parser_pragma_unroll
during parsing calls maybe_constant_value unconditionally, which is
fine if !processing_template_decl, but can ICE otherwise.

While just calling fold_non_dependent_expr there instead could be enough
to fix the ICE (and I guess the right thing to do for backports if any),
I don't see a reason why we couldn't handle a dependent #pragma GCC unroll
argument as well, the unrolling isn't done in the FE and all the middle-end
cares about is that ANNOTATE_EXPR has a 1..65534 last operand when it is
annot_expr_unroll_kind.

So, the following patch changes all the unsigned short unroll arguments
to tree unroll (and thus avoids the tree -> unsigned short -> tree
conversions), does the type and value checking during parsing only if
the argument isn't dependent and repeats it during instantiation.

2023-12-04  Jakub Jelinek  <jakub@redhat.com>

PR c++/112795
gcc/cp/
* cp-tree.h (cp_convert_range_for): Change UNROLL type from
unsigned short to tree.
(finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Likewise.
* parser.cc (cp_parser_statement): Pass NULL_TREE rather than 0 to
cp_parser_iteration_statement UNROLL argument.
(cp_parser_for, cp_parser_c_for): Change UNROLL type from
unsigned short to tree.
(cp_parser_range_for): Likewise.  Set RANGE_FOR_UNROLL to just UNROLL
rather than build_int_cst from it.
(cp_convert_range_for, cp_parser_iteration_statement): Change UNROLL
type from unsigned short to tree.
(cp_parser_omp_loop_nest): Pass NULL_TREE rather than 0 to
cp_parser_range_for UNROLL argument.
(cp_parser_pragma_unroll): Return tree rather than unsigned short.
If parsed expression is type dependent, just return it, don't diagnose
issues with value if it is value dependent.
(cp_parser_pragma): Change UNROLL type from unsigned short to tree.
* semantics.cc (finish_while_stmt_cond): Change UNROLL type from
unsigned short to tree.  Build ANNOTATE_EXPR with UNROLL as its last
operand rather than build_int_cst from it.
(finish_do_stmt, finish_for_cond): Likewise.
* pt.cc (tsubst_stmt) <case RANGE_FOR_STMT>: Change UNROLL type from
unsigned short to tree and set it to RECUR on RANGE_FOR_UNROLL (t).
(tsubst_expr) <case ANNOTATE_EXPR>: For annot_expr_unroll_kind repeat
checks on UNROLL value from cp_parser_pragma_unroll.
gcc/testsuite/
* g++.dg/ext/unroll-5.C: New test.
* g++.dg/ext/unroll-6.C: New test.

5 months agoRISC-V: Update crypto vector ISA info with latest spec
Feng Wang [Mon, 4 Dec 2023 06:43:19 +0000 (06:43 +0000)] 
RISC-V: Update crypto vector ISA info with latest spec

This patch add the Zvkb subset of crypto vector extension. The
corresponding test cases have aslo been modified.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zvkb ISA info.
* config/riscv/riscv.opt: Add Mask(ZVKB)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn-1.c: Replace zvbb with zvkb.
* gcc.target/riscv/zvkn.c:   Ditto.
* gcc.target/riscv/zvknc-1.c:Ditto.
* gcc.target/riscv/zvknc-2.c:Ditto.
* gcc.target/riscv/zvknc.c:  Ditto.
* gcc.target/riscv/zvkng-1.c:Ditto.
* gcc.target/riscv/zvkng-2.c:Ditto.
* gcc.target/riscv/zvkng.c:  Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvks.c:   Ditto.
* gcc.target/riscv/zvksc-1.c:Ditto.
* gcc.target/riscv/zvksc-2.c:Ditto.
* gcc.target/riscv/zvksc.c:  Ditto.
* gcc.target/riscv/zvksg-1.c:Ditto.
* gcc.target/riscv/zvksg-2.c:Ditto.
* gcc.target/riscv/zvksg.c:  Ditto.

5 months agoprefer Zicond primitive semantics to SFB
Fei Gao [Tue, 28 Nov 2023 02:32:24 +0000 (02:32 +0000)] 
prefer Zicond primitive semantics to SFB

Move Zicond md files ahead of SFB to recognize Zicond first.

Take the following case for example.

CFLAGS: -mtune=sifive-7-series -march=rv64gc_zicond -mabi=lp64d

long primitiveSemantics_00(long a, long b) { return a == 0 ? 0 : b; }

before patch:
primitiveSemantics_00:
bne a0,zero,1f # movcc
mv a1,zero
1:
mv a0,a1
ret

after patch:
primitiveSemantics_00:
czero.eqz a0,a1,a0
ret

Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/riscv.md (*mov<GPR:mode><X:mode>cc):move to sfb.md
* config/riscv/sfb.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-sfb-primitiveSemantics.c: New test.

5 months agoRISC-V: Add sifive-x280 to -mcpu
Kito Cheng [Mon, 4 Dec 2023 06:17:52 +0000 (14:17 +0800)] 
RISC-V: Add sifive-x280 to -mcpu

x280 is one of SiFive core, and it release for a while, also
upstream LLVM already support that.

[1] https://www.sifive.com/cores/intelligence-x280

gcc/ChangeLog:

* config/riscv/riscv-cores.def: Add sifive-x280.
* doc/invoke.texi (RISC-V Options): Add sifive-x280

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-sifive-x280.c: New test.

5 months agoRISC-V: Refactor riscv_implied_info_t to make it able to handle conditional implicati...
Kito Cheng [Mon, 27 Nov 2023 14:01:44 +0000 (22:01 +0800)] 
RISC-V: Refactor riscv_implied_info_t to make it able to handle conditional implication [NFC]

RISC-V ISA implication rules become little bit complicated than before,
it may come with condition, so this commit extend the capability of
riscv_implied_info_t, also make it more...C++ize.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_predicator_t): New.
(riscv_implied_info_t::riscv_implied_info_t): New.
(riscv_implied_info_t::match): New.
(riscv_implied_info): New entry for zcf.
(riscv_subset_list::handle_implied_ext): Use
riscv_implied_info_t::match.
(riscv_subset_list::check_implied_ext): Ditto.
(riscv_subset_list::handle_combine_ext): Ditto.
(riscv_subset_list::parse): Move zcf implication handling to
riscv_implied_infos.

5 months agoRISC-V: Refine riscv_subset_list::parse [NFC]
Kito Cheng [Mon, 27 Nov 2023 07:28:30 +0000 (15:28 +0800)] 
RISC-V: Refine riscv_subset_list::parse [NFC]

Extract the logic of checking conflict extensions to a standard alone
function, prepare to add more checking logic.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::check_conflict_ext): New.
(riscv_subset_list::parse): Move checking conflict ext. to
check_conflict_ext.
* config/riscv/riscv-subset.h:
Add riscv_subset_list::check_conflict_ext.

5 months agoi386: Fix CPUID of USER_MSR.
Hu, Lin1 [Mon, 27 Nov 2023 03:28:00 +0000 (11:28 +0800)] 
i386: Fix CPUID of USER_MSR.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Move USER_MSR
to the correct location.

gcc/testsuite/ChangeLog:

* gcc.target/i386/user_msr-1.c: Correct the MSR index for give the user
an proper example.

5 months agoPR modula2/112825: modula2 builds target objects as part of all-gcc
Gaius Mulley [Mon, 4 Dec 2023 01:35:46 +0000 (01:35 +0000)] 
PR modula2/112825: modula2 builds target objects as part of all-gcc

This patch fixes the PR modula2/112825 which fails if the target
assembler is not present on the host.  This can be seen if the
build invokes make all-gcc.  m2 should not attempt to generate
target libraries when performing make all-gcc.

Prior to this patch it generated build/gcc/m2/gm2-libs/SYSTSEM.def
using the script gcc/m2/tools-src/makeSystem (and gm2 -c).
makeSystem should exec gm2 -S instead (and other flags)
to generate the list of target data types without requiring any
target tools.  The target types emitted are textually converted
into SYSTEM.def.

gcc/m2/ChangeLog:

PR modula2/112825
* tools-src/makeSystem: Change all occurrences of -c to -S.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
5 months agoRISC-V: Robostify the W43, W86, W87 constraint enabled attribute
Juzhe-Zhong [Sun, 3 Dec 2023 22:47:41 +0000 (06:47 +0800)] 
RISC-V: Robostify the W43, W86, W87 constraint enabled attribute

Committed as it is obvious fix.

gcc/ChangeLog:

* config/riscv/riscv.md: Rostify the constraints.

5 months agoLoongArch: Add intrinsic function descriptions for LSX and LASX instructions to doc.
chenxiaolong [Tue, 7 Nov 2023 03:53:39 +0000 (11:53 +0800)] 
LoongArch: Add intrinsic function descriptions for LSX and LASX instructions to doc.

gcc/ChangeLog:

* doc/extend.texi: Add information about the intrinsic function of the vector
instruction.

5 months agoDaily bump.
GCC Administrator [Mon, 4 Dec 2023 00:16:38 +0000 (00:16 +0000)] 
Daily bump.