Alexandre Oliva [Thu, 7 Dec 2023 03:38:18 +0000 (00:38 -0300)]
analyzer: deal with -fshort-enums
On platforms that enable -fshort-enums by default, various switch-enum
analyzer tests fail, because apply_constraints_for_gswitch doesn't
expect the integral promotion type cast. I've arranged for the code
to cope with those casts.
for gcc/analyzer/ChangeLog
* region-model.cc (has_nondefault_case_for_value_p): Take
enumerate type as a parameter.
(region_model::apply_constraints_for_gswitch): Cope with
integral promotion type casts.
Alexandre Oliva [Thu, 7 Dec 2023 03:38:14 +0000 (00:38 -0300)]
libsupc++: try cxa_thread_atexit_impl at runtime
g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
a platform that lacks __cxa_thread_atexit_impl, even if the program is
built and run using that toolchain on a (later) platform that offers
__cxa_thread_atexit_impl.
This patch adds runtime testing for __cxa_thread_atexit_impl on select
platforms (GNU variants, for starters) that support weak symbols.
aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data
values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD
register configurations, either in the DI or DF modes. This leads to
the need for a mode iterator accounting for the V1DI, V1DF, V2DI and
V2DF modes.
This patch therefore introduces the new V12DIF mode iterator with
which to generate functions operating on signed 64-bit integer and
float values and V12DIUP for generating the unsigned and
polynomial-type counterparts. Along with this, we modify the
associated mode attributes accordingly in order to allow for the
implementation of the relevant backend patterns for the intrinsics.
gcc/ChangeLog:
* config/aarch64/iterators.md (V12DIF): New.
(V12DUP): Likewise.
(VEL): Add support for all V12DIF-associated modes.
(Vetype): Add support for V1DI and V1DF.
(Vel): Likewise.
Hongyu Wang [Sat, 2 Dec 2023 04:55:59 +0000 (12:55 +0800)]
[APX NDD] Support TImode shift for NDD
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.
Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
function to split NDD form lshift.
(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
prototype.
(ix86_split_rshift_ndd): Likewise.
* config/i386/i386.md (ashl<mode>3_doubleword): Add NDD
alternative, call ndd split function when operands[0]
not equal to operands[1].
(define_split for doubleword lshift): Likewise.
(define_peephole for doubleword lshift): Likewise.
(<insn><mode>3_doubleword): Likewise for l/ashiftrt.
(define_split for doubleword l/ashiftrt): Likewise.
(define_peephole for doubleword l/ashiftrt): Likewise.
Hongyu Wang [Wed, 8 Nov 2023 08:04:26 +0000 (16:04 +0800)]
[APX NDD] Support APX NDD for cmove insns
gcc/ChangeLog:
* config/i386/i386.md (*mov<mode>cc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.
Hongyu Wang [Tue, 7 Nov 2023 08:28:28 +0000 (16:28 +0800)]
[APX NDD] Support APX NDD for shld/shrd insns
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.
Hongyu Wang [Tue, 31 Oct 2023 06:21:16 +0000 (14:21 +0800)]
[APX NDD] Support APX NDD for rotate insns
gcc/ChangeLog:
* config/i386/i386.md (*<insn><mode>3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*<insn>si3_1_zext): Likewise.
(*<insn><mode>3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternative.
(rcrdi2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
Hongyu Wang [Wed, 25 Oct 2023 08:26:49 +0000 (16:26 +0800)]
[APX NDD] Support APX NDD for right shift insns
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.
gcc/ChangeLog:
* config/i386/i386.md (ashr<mode>3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr<mode>3_1): Likewise for SI/DI mode.
(*lshr<mode>3_1): Likewise.
(*<insn>si3_1_zext): Likewise.
(*ashr<mode>3_1): Likewise for QI/HI mode.
(*lshrqi3_1): Likewise.
(*lshrhi3_1): Likewise.
(<insn><mode>3_cmp): Likewise.
(*<insn><mode>3_cconly): Likewise.
(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*highpartdisi2): Likewise.
(*<insn>si3_cmp_zext): Likewise.
(<insn><mode>3_carry): Likewise.
Hongyu Wang [Wed, 25 Oct 2023 07:07:29 +0000 (15:07 +0800)]
[APX NDD] Support APX NDD for left shift insns
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.
The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.
The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.
gcc/ChangeLog:
* config/i386/i386.md (*ashl<mode>3_1): Extend with new
alternatives to support NDD, limit the new alternative to
generate sal only, and adjust output template for NDD.
(*ashlsi3_1_zext): Likewise.
(*ashlhi3_1): Likewise.
(*ashlqi3_1): Likewise.
(*ashl<mode>3_cmp): Likewise.
(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*ashl<mode>3_cconly): Likewise.
(*ashl<dwi>3_doubleword_highpart): Adjust codegen for NDD.
Kong Lingling [Fri, 19 May 2023 02:50:29 +0000 (10:50 +0800)]
[APX NDD] Support APX NDD for or/xor insn
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.
Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.
gcc/ChangeLog:
* config/i386/i386.md (<code><mode>3): Add new alternative for NDD
and adjust output templates.
(*<code><mode>_1): Likewise.
(*<code>qi_1): Likewise.
(*notxor<mode>_1): Likewise.
(*<code>si_1_zext): Likewise.
(*notxorqi_1): Likewise.
(*<code><mode>_2): Likewise.
(*<code>si_2_zext): Likewise.
(*<code>si_2_zext_imm): Likewise.
(*<code>si_1_zext_imm): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*one_cmplsi2_2_zext): Likewise.
(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
operands[3].
(*<code><dwi>3_doubleword): Add NDD constraints, adopt '&' to NDD dest
and emit move for optimized case if operands[0] != operands[1] or
operands[4] != operands[5].
(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
form OR/XOR insn to <any_logic:code>qi_ext<mode>_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *<code><mode>3_1_slp.
Kong Lingling [Wed, 17 May 2023 09:20:37 +0000 (17:20 +0800)]
[APX NDD] Support APX NDD for and insn
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.
1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.
2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.
3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *<code><mode>_slp_1 and
generates *movstrict<mode>_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.
gcc/ChangeLog:
* config/i386/i386.md (and<mode>3): Add NDD alternatives and adjust
output template.
(*anddi_1): Likewise.
(*and<mode>_1): Likewise.
(*andqi_1): Likewise.
(*andsi_1_zext): Likewise.
(*anddi_2): Likewise.
(*andsi_2_zext): Likewise.
(*andqi_2_maybe_si): Likewise.
(*and<mode>_2): Likewise.
(*and<dwi>3_doubleword): Add NDD alternative, adopt '&' to NDD dest and
emit move for optimized case if operands[0] not equal to operands[1].
(define_split for QI highpart AND): Prohibit splitter to split NDD
form AND insn to <any_logic:code>qi_ext<mode>_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *<code><mode>3_1_slp.
(define_split for zero_extend and optimization): Prohibit splitter to
split NDD form AND insn to zero_extend insn.
Kong Lingling [Mon, 22 May 2023 02:08:39 +0000 (10:08 +0800)]
[APX NDD] Support APX NDD for not insn
For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.
gcc/ChangeLog:
* config/i386/i386.md (one_cmpl<mode>2): Add new constraints for NDD
and adjust output template.
(*one_cmpl<mode>2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*one_cmpl<dwi>2_doubleword): Likewise, and adopt '&' to NDD dest.
(*one_cmpl<mode>2_2): Likewise.
(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
Kong Lingling [Fri, 19 May 2023 09:15:52 +0000 (17:15 +0800)]
[APX NDD] Support APX NDD for neg insn
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
and adjust for NDD.
* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
adjust output template.
(*neg<mode>_1): Likewise.
(*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest.
(*neg<mode>_2): Likewise.
(*neg<mode>_ccc_1): Likewise.
(*neg<mode>_ccc_2): Likewise.
(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*negsi_2_zext): Likewise.
Kong Lingling [Wed, 18 Jan 2023 07:51:23 +0000 (15:51 +0800)]
[APX NDD] Support APX NDD for sbb insn
Similar to *add<dwi>3_doubleword, operands[1] may not equal to operands[0] so
extra move and earlyclobber are required.
gcc/ChangeLog:
* config/i386/i386.md (*sub<dwi>3_doubleword): Add new alternative for
NDD, adopt '&' modifier to NDD dest and emit move when operands[0] not
equal to operands[1].
(*sub<dwi>3_doubleword_zext): Likewise.
(*subv<dwi>4_doubleword): Likewise.
(*subv<dwi>4_doubleword_1): Likewise.
(*subv<mode>4_overflow_1): Add NDD alternatives and adjust output
templates.
(*subv<mode>4_overflow_2): Likewise.
(@sub<mode>3_carry): Likewise.
(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*subsi3_carry_zext): Likewise.
(subborrow<mode>): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
(subborrow<mode>_0): Likewise.
(*sub<mode>3_eq): Likewise.
(*sub<mode>3_ne): Likewise.
(*sub<mode>3_eq_1): Likewise.
Kong Lingling [Wed, 18 Jan 2023 09:52:52 +0000 (17:52 +0800)]
[APX NDD] Support APX NDD for adc insns
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.
For TImode insn, there could be register overlapping between operands[0]
and operands[1] as x86 allocates TImode register sequentially like rax:rdi,
rdi:rdx. After postreload split for TImode, write to 1st highpart rdi will
be overrided by the 2nd lowpart rdi if 2nd lowpart rdi have different src as
input, then the write to 1st highpart rdi will missed and cause miscompliation.
In addition, when input operands contain memory, the address register may also
overlaps with dest register if it is marked dead after one of highpart/lowpart
operation was done.
So the earlyclobber modifier '&' should be added to NDD dest to avoid
overlapping between dest and src operands.
NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.
gcc/ChangeLog:
* config/i386/i386.md (*add<dwi>3_doubleword): Add ndd alternatives,
adopt '&' to ndd dest and move operands[1] to operands[0] when they are
not equal.
(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
(*addv<dwi>4_doubleword): Likewise.
(*addv<dwi>4_doubleword_1): Likewise.
(*add<dwi>3_doubleword_zext): Likewise.
(addv<mode>4_overflow_1): Add ndd alternatives.
(*addv<mode>4_overflow_2): Likewise.
(@add<mode>3_carry): Likewise.
(*add<mode>3_carry_0): Likewise.
(*addsi3_carry_zext): Likewise.
(addcarry<mode>): Likewise.
(addcarry<mode>_0): Likewise.
(*addcarry<mode>_1): Likewise.
(*add<mode>3_eq): Likewise.
(*add<mode>3_ne): Likewise.
(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
Hongyu Wang [Mon, 13 Nov 2023 10:49:07 +0000 (18:49 +0800)]
[APX NDD] Disable seg_prefixed memory usage for NDD add
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.
gcc/ChangeLog:
* config/i386/constraints.md (je): New constraint.
* config/i386/i386-protos.h (x86_poff_operand_p): New function to
check any *POFF constant in operand.
* config/i386/i386.cc (x86_poff_operand_p): New prototype.
* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
Kong Lingling [Wed, 14 Dec 2022 02:10:19 +0000 (10:10 +0800)]
[APX NDD] Support Intel APX NDD for legacy add insn
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.
This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.
In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
new use_ndd flag to check whether ndd can be used for this binop
and adjust operand emit.
(ix86_binary_operator_ok): Likewise.
(ix86_expand_binary_operator): Likewise, and void postreload
expand generate lea pattern when use_ndd is explicit parsed.
* config/i386/i386-options.cc (ix86_option_override_internal):
Prohibit apx subfeatures when not in 64bit mode.
* config/i386/i386-protos.h (ix86_binary_operator_ok):
Add use_ndd flag.
(ix86_fixup_binary_operand): Likewise.
(ix86_expand_binary_operand): Likewise.
* config/i386/i386.md (*add<mode>_1): Extend with new alternatives
to support NDD, and adjust output template.
(*addhi_1): Likewise.
(*addqi_1): Likewise.
David Malcolm [Thu, 7 Dec 2023 00:25:26 +0000 (19:25 -0500)]
analyzer: fix taint false positives with UNKNOWN [PR112850]
PR analyzer/112850 reports a false positive from
-Wanalyzer-tainted-allocation-size on the Linux kernel [1] where
-fanalyzer complains that an allocation size is attacker-controlled
despite the value being correctly sanitized against upper and lower
limits.
The root cause is that the expression is sufficiently complex
to exceed the -param=analyzer-max-svalue-depth= threshold,
currently at 12, with depth 13, and so it is treated as UNKNOWN.
Hence the sanitizations are seen as comparisons of an UNKNOWN
symbolic value against constants, and these were being ignored
by the taint state machine.
The expression in question is relatively typical for those seen in
Linux kernel ioctl handlers, and I was surprised that it had exceeded
the analyzer's default expression complexity limit.
This patch addresses this problem in three ways:
(a) the default value of the threshold parameter is increased, from 12
to 18, so that such expressions are precisely handled
(b) adding a new -Wanalyzer-symbol-too-complex to warn when the symbol
complexity limit is reached. This is off by default for users, and
on by default in the test suite.
(c) the taint state machine handles comparisons against UNKNOWN svalues
by dropping all taint information on that execution path, so that if
the complexity limit has been exceeded we don't generate false positives
As well as fixing the taint false positive (PR analyzer/112850), the
patch also fixes a couple of leak false positives seen on flex-generated
scanners (PR analyzer/103546).
[1] specifically, in sound/core/rawmidi.c's handler for
SNDRV_RAWMIDI_STREAM_OUTPUT.
aarch64: Add support for GCS system registers with the +gcs modifier
Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.
aarch64: Add march flags for +the and +d128 arch extensions
Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.
The `+d128' -march modifier enables the use of the following ACLE
builtin functions:
and defines the __ARM_FEATURE_SYSREG128 macro to 1.
Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
implementations are also reliant on the enablement of the `+the' flag,
which is thus also implemented in this patch.
Edwin Lu [Wed, 6 Dec 2023 00:15:10 +0000 (16:15 -0800)]
RISC-V: Remove xfail from ssa-fre-3.c testcase
Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail
was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail
Eric Gallager [Mon, 4 Dec 2023 15:13:55 +0000 (10:13 -0500)]
remove qmtest-related Makefile targets
On GitHub, Joseph Myers (@jsm28 there) says in MentorEmbedded/qmtest#1
that the qmtest-related targets should have been removed long ago. This
patch does so.
David Malcolm [Wed, 6 Dec 2023 17:35:44 +0000 (12:35 -0500)]
diagnostics: prettify JSON output formats
Previously our JSON output emitted the JSON all on one line, with
no indentation to show the structure of the values.
Although it's easy to reformat such output (e.g. with
"python -m json.tool"), I've found it's a pain to need to do so
e.g. my text editor sometimes hangs when opening a multimegabyte
json file all on one line. Similarly diff-ing is easier if the
json is already formatted.
This patch add whitespace to json output to show the structure.
It turned out to be fairly easy to implement using pretty_printer's
existing indentation machinery.
The patch uses this formatting for the various JSON-based diagnostic
output formats.
For example, with this patch, the output from
fdiagnostics-format=json-stderr looks like:
I was able to update almost all of our DejaGnu test cases for JSON to
handle this format tweak, and IMHO it improved the readability of these
test cases, but a couple were more awkward. Hence I added
-fno-diagnostics-json-formatting as an option to disable this
formatting.
The formatting does not affect the output of -fsave-optimization-record
or the JSON output from gcov (but this could be enabled if desirable).
gcc/analyzer/ChangeLog:
* engine.cc (dump_analyzer_json): Use
flag_diagnostics_json_formatting.
gcc/ChangeLog:
* common.opt (fdiagnostics-json-formatting): New.
* diagnostic-format-json.cc: Add "formatted" boolean
to json_output_format and subclasses, and to the
diagnostic_output_format_init_json_* functions. Use it when
printing JSON.
* diagnostic-format-sarif.cc: Likewise for sarif_builder,
sarif_output_format, and the various
diagnostic_output_format_init_sarif_* functions.
* diagnostic.cc (diagnostic_output_format_init): Add
"json_formatting" boolean and pass on to the various cases.
* diagnostic.h (diagnostic_output_format_init): Add
"json_formatted" param.
(diagnostic_output_format_init_json_stderr): Add "formatted" param
(diagnostic_output_format_init_json_file): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* doc/invoke.texi (-fdiagnostics-format=json): Remove discussion
about JSON output needing formatting.
(-fno-diagnostics-json-formatting): Add.
* gcc.cc (driver_handle_option): Use
opts->x_flag_diagnostics_json_formatting.
* gcov.cc (generate_results): Pass "false" for new formatting
option when printing json.
* json.cc (value::dump): Add new "formatted" param.
(object::print): Likewise, using it to add whitespace to format
the JSON output.
(array::print): Likewise.
(float_number::print): Add new "formatted" param.
(integer_number::print): Likewise.
(string::print): Likewise.
(literal::print): Likewise.
(selftest::assert_print_eq): Add "formatted" param.
(ASSERT_PRINT_EQ): Add "FORMATTED" param.
(selftest::test_writing_objects): Test both formatted and
unformatted printing.
(selftest::test_writing_arrays): Likewise.
(selftest::test_writing_float_numbers): Update for new param of
ASSERT_PRINT_EQ.
(selftest::test_writing_integer_numbers): Likewise.
(selftest::test_writing_strings): Likewise.
(selftest::test_writing_literals): Likewise.
(selftest::test_formatting): New.
(selftest::json_cc_tests): Call it.
* json.h (value::print): Add "formatted" param.
(value::dump): Likewise.
(object::print): Likewise.
(array::print): Likewise.
(float_number::print): Likewise.
(integer_number::print): Likewise.
(string::print): Likewise.
(literal::print): Likewise.
* optinfo-emit-json.cc (optrecord_json_writer::write): Pass
"false" for new formatting option when printing json.
(selftest::test_building_json_from_dump_calls): Likewise.
* opts.cc (common_handle_option): Use
opts->x_flag_diagnostics_json_formatting.
gcc/ChangeLog:
* diagnostic-format-json.cc (on_begin_diagnostic): Convert param
to const reference.
(on_end_diagnostic): Likewise.
(json_output_format::on_end_diagnostic): Likewise.
* diagnostic-format-sarif.cc
(sarif_invocation::add_notification_for_ice): Likewise.
(sarif_result::on_nested_diagnostic): Likewise.
(sarif_ice_notification::sarif_ice_notification): Likewise.
(sarif_builder::end_diagnostic): Likewise.
(sarif_builder::make_result_object): Likewise.
(make_reporting_descriptor_object_for_warning): Likewise.
(sarif_builder::make_locations_arr): Likewise.
(sarif_output_format::on_begin_diagnostic): Likewise.
(sarif_output_format::on_end_diagnostic): Likewise.
* diagnostic.cc (default_diagnostic_starter): Make diagnostic_info
param const.
(default_diagnostic_finalizer): Likewise.
(diagnostic_context::report_diagnostic): Pass diagnostic by
reference to on_{begin,end}_diagnostic.
(diagnostic_text_output_format::on_begin_diagnostic): Convert
param to const reference.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
* diagnostic.h (diagnostic_starter_fn): Make diagnostic_info param
const.
(diagnostic_finalizer_fn): Likeewise.
(diagnostic_output_format::on_begin_diagnostic): Convert param to
const reference.
(diagnostic_output_format::on_end_diagnostic): Likewise.
(diagnostic_text_output_format::on_begin_diagnostic): Likewise.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
(default_diagnostic_starter): Make diagnostic_info param const.
(default_diagnostic_finalizer): Likewise.
* langhooks-def.h (lhd_print_error_function): Make diagnostic_info
param const.
* langhooks.cc (lhd_print_error_function): Likewise.
* langhooks.h (lang_hooks::print_error_function): Likewise.
* tree-diagnostic.cc (diagnostic_report_current_function):
Likewise.
(default_tree_diagnostic_starter): Likewise.
(virt_loc_aware_diagnostic_finalizer): Likewise.
* tree-diagnostic.h (diagnostic_report_current_function):
Likewise.
(virt_loc_aware_diagnostic_finalizer): Likewise.
gcc/fortran/ChangeLog:
* error.cc (gfc_diagnostic_starter): Make diagnostic_info param
const.
(gfc_diagnostic_finalizer): Likewise.
gcc/jit/ChangeLog:
* dummy-frontend.cc (jit_begin_diagnostic): Make diagnostic_info
param const.
(jit_end_diagnostic): Likewise. Pass to add_diagnostic by
reference.
* jit-playback.cc (jit::playback::context::add_diagnostic):
Convert diagnostic_info to const reference.
* jit-playback.h (jit::playback::context::add_diagnostic):
Likewise.
Andrew Stubbs [Mon, 30 Jan 2023 14:43:00 +0000 (14:43 +0000)]
amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
* libgomp.texi: Document low-latency implementation details.
Andrew Stubbs [Thu, 27 Jan 2022 13:48:50 +0000 (13:48 +0000)]
openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).
libgomp/ChangeLog:
* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_init_allocator): Use MEMSPACE_VALIDATE.
(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
* libgomp.texi: Document low-latency implementation details.
* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.
Andrew Stubbs [Fri, 3 Dec 2021 17:46:41 +0000 (17:46 +0000)]
libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.
The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.
For now, the omp_low_lat_mem_alloc allocator also works, but that will change
when I implement the access traits.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(predefined_alloc_mapping): New array. Add _Static_assert to match.
(ARRAY_SIZE): New macro.
(omp_aligned_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators. Simplify existing
fall-backs.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for
predefined allocators. Simplify existing fall-backs.
(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
Implement fall-backs for predefined allocators. Simplify existing
fall-backs.
* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
(__nvptx_lowlat_init): New prototype.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* libgomp.texi: Update memory space table.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* basic-allocator.c: New file.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/omp_alloc-1.c: New test.
* testsuite/libgomp.c/omp_alloc-2.c: New test.
* testsuite/libgomp.c/omp_alloc-3.c: New test.
* testsuite/libgomp.c/omp_alloc-4.c: New test.
* testsuite/libgomp.c/omp_alloc-5.c: New test.
* testsuite/libgomp.c/omp_alloc-6.c: New test.
Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Fix c-c++-common/fhardened-[12].c test fails on hppa
The -fstack-protector and -fstack-protector-strong options are
not supported on hppa since the stack grows up.
2023-12-06 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* c-c++-common/fhardened-1.c: Ignore __SSP_STRONG__ define
if __hppa__ is defined.
* c-c++-common/fhardened-2.c: Ignore __SSP__ define
if __hppa__ is defined.
Juzhe-Zhong [Wed, 6 Dec 2023 14:26:46 +0000 (22:26 +0800)]
RISC-V: Fix VSETVL PASS bug
As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location.
Due to 2 reasons:
1. incorrect transparant computation LCM data. We need to check VL operand defs and uses.
2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl expression.
Jason Merrill [Tue, 5 Dec 2023 20:28:16 +0000 (15:28 -0500)]
c++: partial ordering of object parameter [PR53499]
Looks like we implemented option 1 (skip the object parameter) for CWG532
before the issue was resolved, and never updated to the final resolution of
option 2 (model it as a reference). More recently CWG2445 extended this
handling to static member functions; I think that's wrong, and have
opened CWG2834 to address that and how explicit object member functions
interact with it.
The FIXME comments are to guide how the explicit object member function
support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.
The library testsuite changes are to make partial ordering work again
between the generic operator- in the testcase and
_Pointer_adapter::operator-.
DR 532
PR c++/53499
gcc/cp/ChangeLog:
* pt.cc (more_specialized_fn): Fix object parameter handling.
gcc/testsuite/ChangeLog:
* g++.dg/template/partial-order4.C: New test.
* g++.dg/template/spec26.C: Adjust for CWG532.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/ext_pointer/types/1.cc
* testsuite/23_containers/vector/ext_pointer/types/2.cc
(N::operator-): Make less specialized.
Marek Polacek [Tue, 5 Dec 2023 18:39:49 +0000 (13:39 -0500)]
build: unbreak bootstrap on uclinux targets [PR112762]
Currently, cross-compiling with --target=c6x-uclinux (and several other)
fails due to:
../../src/gcc/config/linux.h:221:45: error: 'linux_fortify_source_default_level' was not declared in this scope
#define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level
In the PR Andrew mentions that another fix would be in config.gcc,
but really, here I meant to use the target hook for glibc only, not
uclibc. This trivial patch fixes the build problem. It means that
-fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3.
PR target/112762
gcc/ChangeLog:
* config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for
glibc only.
Thomas Schwinge [Tue, 5 Dec 2023 08:54:54 +0000 (09:54 +0100)]
Modula-2: Support '-isysroot [...]'
In GCC cross configurations (tested '--target=amdgcn-amdhsa' and
'--target=nvptx-none') with a sysroot configured, the 'gm2' driver invocations
are passed '--sysroot=[...]', which is translated into '-isysroot [...]' for
the 'cc1gm2' compiler invocation. The latter, however gets complained about:
cc1gm2: warning: command-line option ‘-isysroot [...]’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
..., and therefore a ton of FAILs.
Reproducer (also for non-cross, native configurations):
$ build-gcc/gcc/gm2 -Bbuild-gcc/gcc -v --sysroot=/tmp -x modula-2 /dev/null
[...]
build-gcc/gcc/cc1gm2 [...] -isysroot [...]/tmp [...]
cc1gm2: warning: command-line option ‘-isysroot /tmp’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
[...]
Jakub Jelinek [Wed, 6 Dec 2023 11:27:12 +0000 (12:27 +0100)]
libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c
When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
61 | void *__emutls_get_address (struct __emutls_object *);
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
63 | void __emutls_register_common (struct __emutls_object *, word, word, void *);
| ^~~~~~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
140 | __emutls_get_address (struct __emutls_object *obj)
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
204 | __emutls_register_common (struct __emutls_object *obj,
| ^~~~~~~~~~~~~~~~~~~~~~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).
We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.
On the library side, there is an option to just follow what the
compiler is doing and do
EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
word size, word align, void *templ)
{
+ struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.
So, the patch just turns the warning off.
2023-12-06 Thomas Schwinge <thomas@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
aarch64: Add system register duplication check selftest
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.
Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:
* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another. One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used. Example: Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.
A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
aarch64: Add front-end argument type checking for target builtins
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.
Example:
const char *regname = "amcgcr_el0";
long long a = __builtin_aarch64_rsr64 (regname);
is reduced by the ccp1 pass to
long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.
The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.
aarch64: Implement system register validation tools
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler. In particular, this involves:
1. Ensuring a supplied string corresponds to a known system
register name. System registers can be accessed either via their
name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
Register names are validated using a hash map, mapping known
system register names to its corresponding `sysreg_t' struct,
which is populated from the `aarch64_system_regs.def' file.
Register name validation is done via `lookup_sysreg_map', while
the encoding naming convention is validated via a parser
implemented in this patch - `is_implem_def_reg'.
2. Once a given register name is deemed to be valid, it is checked
against a further 2 criteria:
a. Is the referenced register implemented in the target
architecture? This is achieved by comparing the ARCH field
in the relevant SYSREG entry from `aarch64_system_regs.def'
against `aarch64_feature_flags' flags set at compile-time.
b. Is the register being used correctly? Check the requested
operation against the FLAGS specified in SYSREG.
This prevents operations like writing to a read-only system
register.
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64-system-regs.def file should be as follows:
Where the arguments to SYSREG correspond to:
- NAME: The system register name, as used in the assembly language.
- CPENC: The system register encoding, mapping to:
s<sn>_<op1>_c<cn>_c<cm>_<op2>
- FLAG: The entries in the FLAGS field are bitwise-OR'd together to
encode extra information required to ensure proper use of
the system register. For example, a read-only system
register will have the flag F_REG_READ, while write-only
registers will be labeled F_REG_WRITE. Such flags are
tested against at compile-time.
- ARCH: The architectural features the system register is associated
with. This is encoded via one of three possible macros:
1. When a system register is universally implemented, we say
it has no feature requirements, so we tag it with the
AARCH64_NO_FEATURES macro.
2. When a register is only implemented for a single
architectural extension EXT, the AARCH64_FEATURE (EXT), is
used.
3. When a given system register is made available by any of N
possible architectural extensions, the AARCH64_FEATURES(N, ...)
macro is used to combine them accordingly.
In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.
Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.
aarch64: Sync system register information with Binutils
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.
By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart. This being the case, no change should be made in
the GCC copy of the file. Any modifications should first be made in
Binutils and the resulting file copied over to GCC.
GCC does not implement the full range of ISA flags present in
Binutils. Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC. Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file. This is done
in the next patch in the series.
`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S<op0>_<op1>_<Cn>_<Cm>_<op2> encoding of system registers when
issuing mrs/msr instructions. This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.
Robin Dapp [Tue, 5 Dec 2023 14:24:12 +0000 (15:24 +0100)]
RISC-V: Add vec_init expander for masks [PR112854].
PR112854 shows a problem on rv32 with zvl1024b. During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register. This works for zvl512b and its maximum of
32 elements but fails for rv32 and 64 elements.
To circumvent this this patch adds a vec_init expander for vector masks
by initializing a QImode vector and comparing that against 0.
gcc/ChangeLog:
PR target/112854
PR target/112872
* config/riscv/autovec.md (vec_init<mode>qi): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112854.c: New test.
* gcc.target/riscv/rvv/autovec/pr112872.c: New test.
Jakub Jelinek [Wed, 6 Dec 2023 08:59:12 +0000 (09:59 +0100)]
i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]
Regardless of the outcome of the REG_UNUSED discussions, I think
it is a good idea to move the vzeroupper pass one pass later.
As can be seen in the multiple PRs and as postreload.cc documents,
reload/LRA is known to create dead statements quite often, which
is the reason why we have postreload_cse pass at all.
Doing vzeroupper pass before such cleanup means the pass including
df_analyze for it needs to process more instructions than needed
and because mode switching adds note problem, also higher chance of
having stale REG_UNUSED notes.
And, I really don't see why vzeroupper can't wait until those cleanups
are done.
2023-12-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/112760
* config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
after pass_postreload_cse rather than pass_reload.
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Adjust comment for it.
Jakub Jelinek [Wed, 6 Dec 2023 08:55:30 +0000 (09:55 +0100)]
lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]
A zero or sign extension from result of some upwards_2limb operation
is implemented in lower_mergeable_stmt as an extra loop which fills in
the extra bits with 0s or 1s.
If the delta of extended vs. unextended bit count is small, the code
doesn't use a loop and emits up to a couple of stores to constant indexes,
but if the delta is large, it uses
cnt = (bo_bit != 0) + 1 + (rem != 0);
statements. bo_bit is non-zero for bit-field loads and is done in that
case as straight line, the unconditional 1 in there is for a loop which
handles most of the limbs in the delta and finally (rem != 0) is for the
case when the extended precision is not a multiple of limb_prec and is
again done in straight line code (after the loop).
The testcase ICEs because the decision what idx to use was incorrect
for kind == bitint_prec_huge (i.e. when the precision delta is very large)
and rem == 0 (i.e. the extended precision is multiple of limb_prec).
In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should
be either first size_int (start) and then result of create_loop (for bo_bit
!= 0) or just result of create_loop, but by mistake the last case
was size_int (end), which means when precision is multiple of limb_prec
storing above the precision (which ICEs; but also not emitting the loop
which is needed).
2023-12-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112809
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For
separate_ext in kind == bitint_prec_huge mode if rem == 0, create for
i == cnt - 1 the loop rather than using size_int (end).
Jakub Jelinek [Wed, 6 Dec 2023 08:54:03 +0000 (09:54 +0100)]
driver: Fix bootstrap with --enable-default-pie
On IRC Iain mentioned bootstrap is broken for him presumably since
r14-5791 -fhardened addition. I think it is only a problem with
--enable-default-pie when the case OPT_pie: wants to fall through
into case OPT_r: and warns.
Before the patch validated = true; was set up if ENABLE_DEFAULT_PIE
for OPT_pie, and for -fhardened as documented I think we want to
set any_link_options_p = true; for it too:
/* True if -r, -shared, -pie, or -no-pie were specified on the command
line. */
static bool any_link_options_p;
2023-12-06 Jakub Jelinek <jakub@redhat.com>
* gcc.cc (driver_handle_option): Add /* FALLTHROUGH */ comment
between OPT_pie and OPT_r cases.
Richard Biener [Tue, 5 Dec 2023 08:21:35 +0000 (09:21 +0100)]
tree-optimization/112843 - update_stmt doing wrong things
The following removes range_query::update_stmt and its single
invocation from update_stmt_operands. That function is not
supposed to look beyond the raw stmt contents of the passed
stmt since there's no guarantee about the rest of the IL.
PR tree-optimization/112843
* tree-ssa-operands.cc (update_stmt_operands): Do not call
update_stmt from ranger.
* value-query.h (range_query::update_stmt): Remove.
* gimple-range.h (gimple_ranger::update_stmt): Likewise.
* gimple-range.cc (gimple_ranger::update_stmt): Likewise.
This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.
Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception. There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.
Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.
There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels. In the default mode,
relaxed, this requirement is not enforced by the compiler.
The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing. Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.
There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used. There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.
for gcc/ChangeLog
* Makefile.in (OBJS): Add ipa-strub.o.
(GTFILES): Add ipa-strub.cc.
* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_builtin): Call them.
* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.
* gengtype-lex.l: Ignore multi-line pp-directives.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.cc, ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.
* doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
(TARGET_STRUB_MAY_USE_MEMSET): New.
* doc/tm.texi: Rebuilt.
* cgraph.h (symtab_node::reset): Add preserve_comdat_group
param, with a default.
* cgraphunit.cc (symtab_node::reset): Use it.
for gcc/c-family/ChangeLog
* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.
* gcc-interface/utils.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(gnat_internal_attribute_table): Add strub.
Jonathan Wakely [Tue, 24 Oct 2023 19:15:12 +0000 (20:15 +0100)]
libstdc++: Add workaround to std::ranges::subrange [PR111948]
libstdc++-v3/ChangeLog:
PR libstdc++/111948
* include/bits/ranges_util.h (subrange): Add constructor to
_Size to aoid setting member in constructor.
* testsuite/std/ranges/subrange/111948.cc: New test.
Jonathan Wakely [Sun, 26 Nov 2023 21:32:35 +0000 (21:32 +0000)]
libstdc++: Implement LWG 4016 for std::ranges::to
This implements the proposed resolution of LWG 4016, so that
std::ranges::to does not use std::back_inserter and std::inserter.
Instead it inserts at the back of the container directly, using
the first supported one of emplace_back, push_back, emplace, and insert.
Using emplace avoids creating a temporary that has to be moved into the
container, for cases where the source range and the destination
container do not have the same value type.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__container_insertable): Remove.
(__detail::__container_inserter): Remove.
(ranges::to): Use emplace_back or emplace, as per LWG 4016.
* testsuite/std/ranges/conv/1.cc (Cont4, test_2_1_4): Check for
use of emplace_back and emplace.
Jonathan Wakely [Tue, 5 Dec 2023 10:22:17 +0000 (10:22 +0000)]
libstdc++: Redefine __glibcxx_assert to work in C++23 constexpr
The changes in r14-5979 to support unknown references in constant
expressions caused some test regressions. The way that __glibcxx_assert
is defined for constant evaluation no longer works when
_GLIBCXX_ASSERTIONS is defined.
This change simplifies __glibcxx_assert so that there is only one check,
rather than a constexpr one and a conditionally-enabled runtime one. The
constexpr one does not need to use __builtin_unreachable to cause a
compilation failure, because __glibcxx_assert_fail is not usable in
constant expressions, so that will cause a failure too.
As well as fixing the regressions, this makes the code for the
assertions shorter and simpler, so should be quicker to compile, and
might inline better too.
libstdc++-v3/ChangeLog:
* include/bits/c++config (__glibcxx_assert_fail): Declare even
when assertions are not enabled.
(__glibcxx_constexpr_assert): Remove macro.
(__glibcxx_assert_impl): Remove macro.
(_GLIBCXX_ASSERT_FAIL): New macro.
(_GLIBCXX_DO_ASSERT): New macro.
(__glibcxx_assert): Simplify to a single definition that works
at runtime and during constant evaluation.
* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
Adjust expected errors.
* testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc:
Likewise.
* testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc:
Likewise.
* testsuite/23_containers/span/back_neg.cc: Likewise.
* testsuite/23_containers/span/front_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_neg.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.
outer_prec is the PRECISION of RVVM1SImode
inner_prec is the PRECISION of V64SImode
when it is zvl512b.
outer_prec is VLA mode with size (512, 512)
inner_prec is VLS mode with size (2048, 0)
Their precision/size relationship is not certain.
So block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR, then we never reaches
the situation that comparing the precision/size between VLA size and VLS size that size > coeffs[0] of VLA mode.
Jakub Jelinek [Tue, 5 Dec 2023 22:32:19 +0000 (23:32 +0100)]
libiberty: Fix build with GCC < 7
Tobias reported on IRC that the linker fails to build with GCC 4.8.5.
In configure I've tried to use everything actually used in the sha1.c
x86 hw implementation, but unfortunately I forgot about implicit function
declarations. GCC before 7 did have <cpuid.h> header and bit_SHA define
and __get_cpuid function defined inline, but it didn't define
__get_cpuid_count, which compiled fine (and the configure test is
intentionally compile time only) due to implicit function declaration,
but then failed to link when linking the linker, because
__get_cpuid_count wasn't defined anywhere.
The following patch fixes that by using what autoconf uses in AC_CHECK_DECL
to make sure the functions are declared.
2023-12-05 Jakub Jelinek <jakub@redhat.com>
* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and
__get_cpuid_count are not implicitly declared.
* configure: Regenerated.
David Faust [Mon, 4 Dec 2023 22:08:03 +0000 (14:08 -0800)]
btf: avoid wrong DATASEC entries for extern vars [PR112849]
The process of creating BTF_KIND_DATASEC records involves iterating
through variable declarations, determining which section they will be
placed in, and creating an entry in the appropriate DATASEC record
accordingly.
For variables without e.g. an explicit __attribute__((section)), we use
categorize_decl_for_section () to identify the appropriate named section
and corresponding BTF_KIND_DATASEC record.
This was incorrectly being done for 'extern' variable declarations as
well as non-extern ones, which meant that extern variable declarations
could result in BTF_KIND_DATASEC entries claiming the variable is
allocated in some section such as '.bss' without any knowledge whether
that is actually true. That resulted in errors building the Linux kernel
BPF selftests.
This patch corrects btf_collect_datasec () to avoid assuming a section
for extern variables, and only emit BTF_KIND_DATASEC entries for them if
they have a known section.
gcc/
PR debug/112849
* btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
entry in a BTF_KIND_DATASEC record for extern variable decls without
a known section.
gcc/testsuite/
PR debug/112849
* gcc.dg/debug/btf/btf-datasec-3.c: New test.
As reported, libgfortran fails to build on targets where int32_t and int
are different types, because it uses int vs. GFC_INTEGER_4 (under hood
int32_t) interchangeably.
The following patch fixes that.
2023-12-05 Florian Weimer <fweimer@redhat.com>
Jakub Jelinek <jakub@redhat.com>
* io/list_read.c (list_formatted_read_scalar) <case BT_CLASS>:
Change types of unit and noiostat to GFC_INTEGER_4 from int, change
type of child_iostat from to GFC_INTEGER_4 * from int *, formatting
fixes.
(nml_read_obj): Likewise.
* io/write.c (list_formatted_write_scalar) <case BT_CLASS>: Likewise.
(nml_write_obj): Likewise.
* io/transfer.c (unformatted_read, unformatted_write): Likewise.
Jakub Jelinek [Tue, 5 Dec 2023 21:54:08 +0000 (22:54 +0100)]
c++: Further #pragma GCC unroll C++ fix [PR112795]
When committing the #pragma GCC unroll patch, I found I forgot one spot
for diagnosting the invalid unrolls - if #pragma GCC unroll argument is
dependent and the pragma is before a range for loop, the unroll tree (now,
before one converted form ushort) is saved into RANGE_FOR_UNROLL and
tsubst_stmt was RECURing on it, but didn't diagnose if it was invalid and
so we ICEd later in the middle-end when ANNOTATE_EXPR had unexpected
argument.
The following patch fixes that. So that the diagnostics isn't done in 3
different places, the patch introduces a new function that both
cp_parser_pragma_unroll and instantiation of ANNOTATE_EXPR and RANGE_FOR_STMT
can use.
* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
-std=gnu++98.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/ext/unroll-7.C: New test.
* g++.dg/ext/unroll-8.C: New test.
Jakub Jelinek [Tue, 5 Dec 2023 20:39:31 +0000 (21:39 +0100)]
rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]
The middle-end has been changed quite recently to canonicalize
-abs (x) to copysign (x, -1) rather than the other way around.
While I agree with that at GIMPLE level, since it matches the GIMPLE
goal of as few operations as possible for a canonical form (-abs (x)
is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
I don't really like that being done on RTL as well (or at least
not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
because on most targets most of floating point constants need to be loaded
from memory, there are a few exceptions but -1 is often not one of them.
Anyway, the following patch fixes the rs6000 regression caused by the
change in GIMPLE canonicalization (i.e. the desirable one). As rs6000
clearly prefers -abs (x) form because it has a single instruction to do
that while it also has copysign instruction, but that requires loading the
-1 from memory, the following patch just ensures the copysign expander
can actually see the floating point constant and in that case emits the
-abs (x) code (or in the hypothetical case of copysign with non-negative
constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
and does what it did before.
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR target/112606
* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
of the last argument from gpc_reg_operand to any_operand. If
operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
its sign, otherwise if it doesn't satisfy gpc_reg_operand,
force it to REG using copy_to_mode_reg.
Harald Anlauf [Mon, 4 Dec 2023 21:44:53 +0000 (22:44 +0100)]
Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]
gcc/fortran/ChangeLog:
PR fortran/100988
* gfortran.h (IS_PROC_POINTER): New macro.
* trans-types.cc (gfc_sym_type): Use macro in determination if the
restrict qualifier can be used for a dummy variable. Fix logic to
allow the restrict qualifier also for optional arguments, and to
not apply it to pointer or proc_pointer arguments.
GCC 5 and earlier applied array-to-pointer decay too early,
which affected the new attribute namespace code. A reduced
example of the construct that the attribute code uses is:
struct S { template<__SIZE_TYPE__ N> S(int (&)[N]); };
struct T { int a; S b; };
int a[] = { 1 };
T t = { 1, a };
This patch tries to add a minimally-invasive workaround.
gcc/ada/
* gcc-interface/utils.cc (gnat_internal_attribute_table): Add extra
braces to work around PR 16333 in older compilers.
gcc/
* attribs.cc (handle_ignored_attributes_option): Add extra
braces to work around PR 16333 in older compilers.
* config/aarch64/aarch64.cc (aarch64_gnu_attribute_table): Likewise.
(aarch64_arm_attribute_table): Likewise.
* config/arm/arm.cc (arm_gnu_attribute_table): Likewise.
* config/i386/i386-options.cc (ix86_gnu_attribute_table): Likewise.
* config/ia64/ia64.cc (ia64_gnu_attribute_table): Likewise.
* config/rs6000/rs6000.cc (rs6000_gnu_attribute_table): Likewise.
* target-def.h (TARGET_GNU_ATTRIBUTES): Likewise.
* genhooks.cc (emit_init_macros): Likewise, when emitting the
instantiation of TARGET_ATTRIBUTE_TABLE.
* langhooks-def.h (LANG_HOOKS_INITIALIZER): Likewise, when
instantiating LANG_HOOKS_ATTRIBUTE_TABLE.
(LANG_HOOKS_ATTRIBUTE_TABLE): Define to be empty by default.
* target.def (attribute_table): Likewise.
gcc/c-family/
* c-attribs.cc (c_common_gnu_attribute_table): Add extra
braces to work around PR 16333 in older compilers.
gcc/c/
* c-decl.cc (std_attribute_table): Add extra braces to work
around PR 16333 in older compilers.
gcc/cp/
* tree.cc (cxx_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.
gcc/d/
* d-attribs.cc (d_langhook_common_attribute_table): Add extra braces
to work around PR 16333 in older compilers.
(d_langhook_gnu_attribute_table): Likewise.
gcc/fortran/
* f95-lang.cc (gfc_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.
gcc/jit/
* dummy-frontend.cc (jit_gnu_attribute_table): Add extra braces
to work around PR 16333 in older compilers.
(jit_format_attribute_table): Likewise.
gcc/lto/
* lto-lang.cc (lto_gnu_attribute_table): Add extra braces to work
around PR 16333 in older compilers.
(lto_format_attribute_table): Likewise.
All set_debug_format member functions should be guarded by the
__cpp_lib_formatting_ranges macro (which is not defined yet).
libstdc++-v3/ChangeLog:
PR libstdc++/112832
* include/std/format (formatter::set_debug_format): Ensure this
member is defined conditionally for all specializations.
* testsuite/std/format/formatter/112832.cc: New test.
Jakub Jelinek [Tue, 5 Dec 2023 16:38:46 +0000 (17:38 +0100)]
c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]
Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration
and the change was voted in as a DR.
The following patch implements it by parsing the attributes and warning
about them.
I found one attribute parsing bug I'll send a fix for momentarily.
And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute? cp_parser_declaration at least tries:
if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
but attribute_ignored_p here checks the first attribute rather than the
whole chain. So it will incorrectly not warn if there is an ignored
attribute followed by non-ignored.
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR c++/110734
* parser.cc (cp_parser_block_declaration): Implement C++ DR 2262
- Attributes for asm-definition. Call cp_parser_asm_definition
even if RID_ASM token is only seen after sequence of standard
attributes.
(cp_parser_asm_definition): Parse standard attributes before
RID_ASM token and warn for them with -Wattributes.
* g++.dg/DRs/dr2262.C: New test.
* g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors
on attributes on asm definitions.
* g++.dg/gomp/attrs-11.C: Remove 2 expected errors.
Gaius Mulley [Tue, 5 Dec 2023 14:54:00 +0000 (14:54 +0000)]
PR modula2/112865 IM and RE fails to skip type equivalences
This patch skip type equivalences when checking IM and RE
ISO M2 standard functions for complex data type operands.
gcc/m2/ChangeLog:
PR modula2/112865
* gm2-compiler/M2Quads.mod (BuildReFunction): Use
GetDType to retrieve the type of the operand when
converting the complex type to its scalar equivalent.
(BuildImFunction): Use GetDType to retrieve the type of the
operand when converting the complex type to its scalar
equivalent.
Richard Biener [Mon, 4 Dec 2023 09:35:38 +0000 (10:35 +0100)]
middle-end/112830 - avoid gimplifying non-default addr-space assign to memcpy
The following avoids turning aggregate copy involving non-default
address-spaces to memcpy since that is not prepared for that.
GIMPLE verification no longer accepts WITH_SIZE_EXPR in aggregate
copies, the following re-allows that for the RHS. I also needed
to adjust one assert in DCE.
get_memory_address is used for string builtin expansion, so instead
of fixing that up for non-generic address-spaces I've put an assert
there.
I'll note that the same issue exists for initialization from an
empty CTOR which we gimplify to a memset call but since we are
not prepared to handle RTL expansion of the original VLA init and
I failed to provide test coverage (without extending the GNU C
extension for VLA structs) and the Ada frontend (or other frontends)
to not have address-space support the patch instead asserts we only
see generic address-spaces there.
PR middle-end/112830
* gimplify.cc (gimplify_modify_expr): Avoid turning aggregate
copy of non-generic address-spaces to memcpy.
(gimplify_modify_expr_to_memcpy): Assert we are dealing with
a copy inside the generic address-space.
(gimplify_modify_expr_to_memset): Likewise.
* tree-cfg.cc (verify_gimple_assign_single): Allow
WITH_SIZE_EXPR as part of the RHS of an assignment.
* builtins.cc (get_memory_address): Assert we are dealing
with the generic address-space.
* tree-ssa-dce.cc (ref_may_be_aliased): Handle WITH_SIZE_EXPR.
* gcc.target/avr/pr112830.c: New testcase.
* gcc.target/i386/pr112830.c: Likewise.
Richard Biener [Tue, 5 Dec 2023 07:50:57 +0000 (08:50 +0100)]
tree-optimization/112856 - fix LC SSA after loop header copying
When loop header copying unloops loops we have to possibly fixup
LC SSA. I've take the opportunity to streamline the unloop_loops
API, removing the use of a ivcanon local global variable.
PR tree-optimization/109689
PR tree-optimization/112856
* cfgloopmanip.h (unloop_loops): Adjust API.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Take edges_to_remove
as parameter.
(canonicalize_induction_variables): Adjust.
(tree_unroll_loops_completely): Likewise.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Rewrite into
LC SSA if we unlooped some loops and we are in LC SSA.
* gcc.dg/torture/pr109689.c: New testcase.
* gcc.dg/torture/pr112856.c: Likewise.
Jakub Jelinek [Tue, 5 Dec 2023 12:17:57 +0000 (13:17 +0100)]
i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.
The following patch fixes that. As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.
This patch adds support for the SME2 <arm_sme.h> intrinsics. The
convention I've used is to put stuff in aarch64-sve-builtins-sme.*
if it relates to ZA, ZT0, the streaming vector length, or other
such SME state. Things that operate purely on predicates and
vectors go in aarch64-sve-builtins-sve2.* instead. Some of these
will later be picked up for SVE2p1.
We previously used Uph internally as a constraint for 16-bit
immediates to atomic instructions. However, we need a user-facing
constraint for the upper predicate registers (already available as
PR_HI_REGS), and Uph makes a natural pair with the existing Upl.
gcc/
* config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro.
(P_ALIASES): Likewise.
(REGISTER_NAMES): Add pn aliases of the predicate registers.
(W8_W11_REGNUM_P): New macro.
(W8_W11_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
* config/aarch64/aarch64.cc (aarch64_print_operand): Add support
for %K, which prints a predicate as a counter. Handle tuples of
predicates.
(aarch64_regno_regclass): Handle W8_W11_REGS.
(aarch64_class_max_nregs): Likewise.
* config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints.
(x, y): Move further up file.
(Uph): Redefine as the high predicate registers, renaming the old
constraint to...
(Uih): ...this.
* config/aarch64/predicates.md (const_0_to_7_operand): New predicate.
(const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise.
(const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise.
(aarch64_simd_shift_imm_qi): Use const_0_to_7_operand.
* config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY)
(VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4)
(SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24)
(SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24)
(SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24)
(SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators.
(UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs.
(UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN)
(UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN)
(UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB)
(UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise.
(UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise.
(UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise.
(UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT)
(UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ)
(UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS)
(UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise.
(UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise.
(UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise.
(UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise.
(Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv)
(VSINGLE, vsingle, b): Add tuple modes.
(v2xwide, za32_offset_range, za64_offset_range, za32_long)
(za32_last_offset, vg_modifier, z_suffix, aligned_operand)
(aligned_fpr): New mode attributes.
(SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI)
(SVE_FP_BINARY_MULTI): New int iterators.
(SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
(SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN)
(SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ)
(UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI)
(SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD)
(SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE)
(SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS)
(LUTI_BITS): New int iterators.
(optab, sve_int_op): Handle the new unspecs.
(sme_int_op, has_16bit_form): New int attributes.
(bits_etype): Handle 64.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
* config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi
rather than Uph for HImode immediates.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): New patterns.
(@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to...
(@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>)
(@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>):
...these new patterns.
(SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants. Add
SVE_WHILE_B to existing while patterns.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>)
(@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2)
(@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus)
(@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2)
(<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>)
(@aarch64_sve_<sve_int_op><mode>): New patterns.
(@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>)
(*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>)
(@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x)
(@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2)
(@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns.
(@aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(*aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise.
(aarch64_sve_fdotvnx4sfvnx8hf): Likewise.
(aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise.
(@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise.
(truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise.
(<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise.
(@aarch64_sve_sel<mode>): Likewise.
(@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise.
(@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise.
(@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise.
(@aarch64_sve_<optab><mode>): Likewise.
* config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>)
(*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>)
(*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns.
(*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise.
(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
(@aarch64_sme_single_<optab><mode>): Likewise.
(*aarch64_sme_single_<optab><mode>_plus): Likewise.
(@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>)
(*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(@aarch64_sme_lut<LUTI_BITS><mode>): Likewise.
(UNSPEC_SME_LUTI): New unspec.
* config/aarch64/aarch64-sve-builtins.def (single): New mode suffix.
(c8, c16, c32, c64): New type suffixes.
(vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2)
(vg4x4): New group suffixes.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0)
(CP_WRITE_ZT0): New constants.
(get_svbool_t): Delete.
(function_resolver::report_mismatched_num_vectors): New member
function.
(function_resolver::resolve_conversion): Likewise.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_nonscalar_type): Likewise.
(function_resolver::finish_opt_single_resolution): Likewise.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter.
(function_expander::map_to_rtx_codes): Add an extra parameter
for unconditional FP unspecs.
(function_instance::gp_type_index): New member function.
(function_instance::gp_type): Likewise.
(function_instance::gp_mode): Handle multi-vector operations.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count)
(TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen)
(TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2)
(TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4)
(TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu)
(TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer)
(TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned)
(TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type
macros.
(groups_x2, groups_x12, groups_x4, groups_x24, groups_x124)
(groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4)
(groups_vg24): New group arrays.
(function_instance::reads_global_state_p): Handle CP_READ_ZT0.
(function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0.
(add_shared_state_attribute): Handle zt0 state.
(function_builder::add_overloaded_functions): Skip MODE_single
for non-tuple groups.
(function_resolver::report_mismatched_num_vectors): New function.
(function_resolver::resolve_to): Add a fallback error message for
the general two-type case.
(function_resolver::resolve_conversion): New function.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_matching_vector_type): Specifically
diagnose mismatched vector counts.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter. Extend to handle cases where
tuples are expected.
(function_resolver::require_nonscalar_type): New function.
(function_resolver::check_gp_argument): Use gp_type_index rather
than hard-coding VECTOR_TYPE_svbool_t.
(function_resolver::finish_opt_single_resolution): New function.
(function_checker::require_immediate_either_or): Remove hard-coded
constants.
(function_expander::direct_optab_handler): New function.
(function_expander::use_pred_x_insn): Only add a strictness flag
is the insn has an operand for it.
(function_expander::map_to_rtx_codes): Take an unconditional
FP unspec as an extra parameter. Handle tuples and MODE_single.
(function_expander::map_to_unspecs): Handle tuples and MODE_single.
* config/aarch64/aarch64-sve-builtins-functions.h (read_zt0)
(write_zt0): New typedefs.
(full_width_access::memory_vector): Use the function's
vectors_per_tuple.
(rtx_code_function_base): Add an optional unconditional FP unspec.
(rtx_code_function::expand): Update accordingly.
(rtx_code_function_rotated::expand): Likewise.
(unspec_based_function_exact_insn::expand): Use tuple_mode instead
of vector_mode.
(unspec_based_uncond_function): New typedef.
(cond_or_uncond_unspec_function): New class.
(sme_1mode_function::expand): Handle single forms.
(sme_2mode_function_t): Likewise, adding a template parameter for them.
(sme_2mode_function): Update accordingly.
(sme_2mode_lane_function): New typedef.
(multireg_permute): New class.
(class integer_conversion): Likewise.
(while_comparison::expand): Handle svcount_t and svboolx2_t results.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp, compare_scalar_count, count_pred_c)
(dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane)
(extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice)
(select_pred, shift_right_imm_narrowxn, storexn, str_zt)
(unary_convertxn, unary_za_slice, unaryxn, write_za)
(write_za_slice): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(za_group_is_pure_overload): New function.
(apply_predication): Use the function's gp_type for the predicate,
instead of hard-coding the use of svbool_t.
(parse_element_type): Add support for "c" (svcount_t).
(parse_type): Add support for "c0" and "c1" (conversion destination
and source types).
(binary_za_slice_lane_base): New class.
(binary_za_slice_opt_single_base): Likewise.
(load_contiguous_base::resolve): Pass the group suffix to r.resolve.
(luti_lane_zt_base): New class.
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp): New shapes.
(compare_scalar_def::build): Allow the return type to be a tuple.
(compare_scalar_def::expand): Pass the group suffix to r.resolve.
(compare_scalar_count, count_pred_c, dot_za_slice_int_lane)
(dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt)
(ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn)
(storexn, str_zt): New shapes.
(ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with...
(ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these
new classes. Allow a second suffix that specifies the type of the
second vector argument, and that is used to derive the third.
(unary_def::build): Extend to handle tuple types.
(unary_convert_def::build): Use the new c0 and c1 format specifiers.
(unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes.
(write_za_slice): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand)
(svext_bhw_impl::expand): Update call to map_to_rtx_costs.
(svcntp_impl::expand): Handle svcount_t variants.
(svcvt_impl::expand): Handle unpredicated conversions separately,
dealing with tuples.
(svdot_impl::expand): Handle 2-way dot products.
(svdotprod_lane_impl::expand): Likewise.
(svld1_impl::fold): Punt on tuple loads.
(svld1_impl::expand): Handle tuple loads.
(svldnt1_impl::expand): Likewise.
(svpfalse_impl::fold): Punt on svcount_t forms.
(svptrue_impl::fold): Likewise.
(svptrue_impl::expand): Handle svcount_t forms.
(svrint_impl): New class.
(svsel_impl::fold): Punt on tuple forms.
(svsel_impl::expand): Handle tuple forms.
(svst1_impl::fold): Punt on tuple loads.
(svst1_impl::expand): Handle tuple loads.
(svstnt1_impl::expand): Likewise.
(svwhilelx_impl::fold): Punt on tuple forms.
(svdot_lane): Use UNSPEC_FDOT.
(svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs.
(rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl.
* config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2)
(svset2, svundef2): Add _b variants.
(svcvt): Use unary_convertxn.
(svdot): Use ternary_qq_opt_n_or_011.
(svdot_lane): Use ternary_qq_or_011_lane.
(svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n.
(svpfalse): Add a form that returns svcount_t results.
(svrinta, svrintm, svrintn, svrintp): Use unaryxn.
(svsel): Use binaryxn.
(svst1, svstnt1): Use storexn.
* config/aarch64/aarch64-sve-builtins-sme.h
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): Declare.
* config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base):
Rename to...
(load_store_za_zt0_base): ...this and extend to tuples.
(load_za_base, store_za_base): Update accordingly.
(expand_ldr_str_zt0): New function.
(svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl)
(svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes.
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): New functions.
* config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl)
(svcvtn_impl, svpext_impl, svpsel_impl): New classes.
(svqrshl_impl::fold): Update for change to svrshl shape.
(svrshl_impl::fold): Punt on tuple forms.
(svsqadd_impl::expand): Update call to map_to_rtx_codes.
(svunpk_impl): New class.
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): New functions.
* config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine __ARM_FEATURE_SME2.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way
for test functions to share ZT0.
(ATTR): Update accordingly.
(TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN)
(TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C)
(TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE)
(TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW)
(TEST_X4_NARROW): New macros.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove
test for svmopa that becomes valid with SME2.
* gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for
existence of svboolx2_t version of svcreate2.
* gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error
messages to account for svcount_t predication.
* gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust
error messages to account for new SME2 variants.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.
SME2 adds a 512-bit lookup table called ZT0. It is enabled
and disabled by PSTATE.ZA, just like ZA itself. This patch
adds support for the register, including saving and restoring
contents.
The code reuses the V8DI that was added for LS64, including
the associated memory classification rules. (The ZT0 range
is more restricted than the LS64 range, but that's enforced
by predicates and constraints.)
gcc/
* config/aarch64/aarch64.md (ZT0_REGNUM): New constant.
(LAST_FAKE_REGNUM): Bump to include it.
* config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0.
(CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(machine_function): Add zt0_save_buffer.
(CUMULATIVE_ARGS): Add shared_zt0_flags;
* config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0.
(aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise.
(aarch64_function_arg): Add the shared ZT0 flags as an extra
limb of the parallel.
(aarch64_init_cumulative_args): Initialize shared_zt0_flags.
(aarch64_extra_live_on_entry): Handle ZT0_REGNUM.
(aarch64_epilogue_uses): Likewise.
(aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions.
(aarch64_restore_zt0): Likewise.
(aarch64_start_call_args): Reject calls to functions that share
ZT0 from functions that have no ZT0 state. Save ZT0 around shared-ZA
calls that do not share ZT0.
(aarch64_expand_call): Handle ZT0. Reject calls to functions that
share ZT0 but not ZA from functions with ZA state.
(aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions
that do not share ZT0.
(aarch64_set_current_function): Require +sme2 for functions that
have ZT0 state.
(aarch64_function_attribute_inlinable_p): Don't allow functions to
be inlined if they have local zt0 state.
(AARCH64_IPA_CLOBBERS_ZT0): New constant.
(aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0.
(aarch64_can_inline_p): Don't inline callees that clobber ZT0
into functions that have ZT0 state.
(aarch64_comp_type_attributes): Check for compatible ZT0 sharing.
(aarch64_optimize_mode_switching): Use mode switching if the
function has ZT0 state.
(aarch64_mode_emit_local_sme_state): Save and restore ZT0 around
calls to private-ZA functions.
(aarch64_mode_needed_local_sme_state): Require ZA to be active
for instructions that access ZT0.
(aarch64_mode_entry): Mark ZA as dead on entry if the function
only shares state other than "za" itself.
(aarch64_mode_exit): Likewise mark ZA as dead on return.
(aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __ARM_STATE_ZT0.
* config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv.
(aarch64_asm_update_zt0): New insn.
(UNSPEC_RESTORE_ZT0): New unspec.
(aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns.
(aarch64_sme_str_zt0): Likewise.
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.
The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work. At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.
We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework. All that changes on the PCS side is
that we now have an associated mode.
gcc/
* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(register_tuple_type): Handle tuples of predicates.
(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
of predicates.
(pure_scalable_type_info::add_piece): Don't try to form pairs of
predicates.
(VEC_STRUCT): Generalize comment.
(aarch64_classify_vector_mode): Handle VNx32BI.
(aarch64_array_mode): Likewise. Return BLKmode for arrays of
predicates that have no associated mode, rather than allowing
an integer mode to be chosen.
(aarch64_hard_regno_nregs): Handle VNx32BI.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_split_double_move): New function, split out from...
(aarch64_split_128bit_move): ...here.
(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
(aarch64_pfalse_reg): Likewise.
(aarch64_sve_same_pred_for_ptest_p): Likewise.
(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
(aarch64_expand_mov_immediate): Restrict handling of boolean vector
constants to single-predicate modes.
(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
can be addressed.
(aarch64_class_max_nregs): Handle VNx32BI.
(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
VNx32BI.
(aarch64_mov_operand_p): Restrict predicate constant canonicalization
to single-predicate modes.
(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
* config/aarch64/constraints.md (PR_REGS): New predicate.
Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks. The SME2 ACLE defines an svcount_t type for
this interpretation.
I don't think we have a better way of representing counters than
the VNx16BI that we use for masks. The patch therefore doesn't
add a new mode for this representation. It's just something that
is interpreted in context, a bit like signed vs. unsigned integers.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Handle reinterprets between svbool_t
and svcount_t.
(svreinterpret_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add
b<->c forms.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New
type suffix list.
(wrap_type_in_struct, register_type_decl): New functions, split out
from...
(register_tuple_type): ...here.
(register_builtin_types): Handle svcount_t.
(handle_arm_sve_h): Don't create tuples of svcount_t.
* config/aarch64/aarch64-sve-builtins.def (svcount_t): New type.
(c): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class.
gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test
for svcount_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P)
(TEST_DUAL_P_REV): New macros.
* gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test.
* gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing
an svcount_t.
* gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test
reinterprets involving svcount_t.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t.
* gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_12.c: New test.
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").
Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function. Any function can tail-call a streaming-compatible function.
gcc/
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
Enforce PSTATE.SM and PSTATE.ZA restrictions.
(aarch64_expand_epilogue): Save and restore the arguments
to a sibcall around any change to PSTATE.SM.
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.
A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.
A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
The callee's function type doesn't matter here: one locally-streaming
function can be inlined into another.
gcc/
* config/aarch64/aarch64.cc: Include symbol-summary.h, ipa-prop.h,
and ipa-fnsummary.h
(aarch64_function_attribute_inlinable_p): New function.
(AARCH64_IPA_SM_FIXED, AARCH64_IPA_CLOBBERS_ZA): New constants.
(aarch64_need_ipa_fn_target_info): New function.
(aarch64_update_ipa_fn_target_info): Likewise.
(aarch64_can_inline_p): Restrict the previous ISA flag checks
to non-modal features. Prevent callees that require a particular
PSTATE.SM state from being inlined into callers that can't guarantee
that state. Also prevent callees that have ZA state from being
inlined into callers that don't. Finally, prevent callees that
clobber ZA from being inlined into callers that have ZA state.
(TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver. Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was originally using.
The requirement on nonlocal goto receivers means that nonlocal
jumps need to ensure that PSTATE.SM is zero.
gcc/
* config/aarch64/aarch64.cc: Include except.h
(aarch64_sme_mode_switch_regs::add_call_preserved_reg): New function.
(aarch64_sme_mode_switch_regs::add_call_preserved_regs): Likewise.
(aarch64_need_old_pstate_sm): Return true if the function has
a nonlocal-goto or exception receiver.
(aarch64_switch_pstate_sm_for_landing_pad): New function.
(aarch64_switch_pstate_sm_for_jump): Likewise.
(pass_switch_pstate_sm::gate): Enable the pass for all
streaming and streaming-compatible functions.
(pass_switch_pstate_sm::execute): Handle non-local gotos and their
receivers. Handle exception handler entry points.
This patch adds support for the __arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI. The attribute is valid but redundant for
__arm_streaming functions.
gcc/
* config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add
arm::locally_streaming.
(aarch64_fndecl_is_locally_streaming): New function.
(aarch64_fndecl_sm_state): Handle locally-streaming functions.
(aarch64_cfun_enables_pstate_sm): New function.
(aarch64_add_offset): Add an argument that specifies whether
the streaming vector length should be used instead of the
prevailing one.
(aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_mov_immediate): Update calls accordingly.
(aarch64_need_old_pstate_sm): Return true for locally-streaming
streaming-compatible functions.
(aarch64_layout_frame): Force all call-preserved Z and P registers
to be saved and restored if the function switches PSTATE.SM in the
prologue.
(aarch64_get_separate_components): Disable shrink-wrapping of
such Z and P saves and restores.
(aarch64_use_late_prologue_epilogue): New function.
(aarch64_expand_prologue): Measure SVE lengths in the streaming
vector length for locally-streaming functions, then emit code
to enable streaming mode.
(aarch64_expand_epilogue): Likewise in reverse.
(TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_locally_streaming.
gcc/
* doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64.
* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
to install and aarch64-sve-builtins-sme.o to the list of objects
to build.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine TARGET_SME, TARGET_SME_I16I64 and TARGET_SME_F64F64.
(aarch64_pragma_aarch64): Handle arm_sme.h.
* config/aarch64/aarch64-option-extensions.def (sme-i16i64)
(sme-f64f64): New extensions.
* config/aarch64/aarch64-protos.h (aarch64_sme_vq_immediate)
(aarch64_addsvl_addspl_immediate_p, aarch64_output_addsvl_addspl)
(aarch64_output_sme_zero_za): Declare.
(aarch64_output_move_struct): Delete.
(aarch64_sme_ldr_vnum_offset): Declare.
(aarch64_sve::handle_arm_sme_h): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_SM_ON): New macro.
(AARCH64_ISA_SME_I16I64, AARCH64_ISA_SME_F64F64): Likewise.
(TARGET_STREAMING, TARGET_STREAMING_SME): Likewise.
(TARGET_SME_I16I64, TARGET_SME_F64F64): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_rdvl_factor_p): Rename to...
(aarch64_sve_rdvl_addvl_factor_p): ...this.
(aarch64_sve_rdvl_immediate_p): Update accordingly.
(aarch64_rdsvl_immediate_p, aarch64_add_offset): Likewise.
(aarch64_sme_vq_immediate): Likewise. Make public.
(aarch64_sve_addpl_factor_p): New function.
(aarch64_sve_addvl_addpl_immediate_p): Use
aarch64_sve_rdvl_addvl_factor_p and aarch64_sve_addpl_factor_p.
(aarch64_addsvl_addspl_immediate_p): New function.
(aarch64_output_addsvl_addspl): Likewise.
(aarch64_cannot_force_const_mem): Return true for RDSVL immediates.
(aarch64_classify_index): Handle .Q scaling for VNx1TImode.
(aarch64_classify_address): Likewise for vnum offsets.
(aarch64_output_sme_zero_za): New function.
(aarch64_sme_ldr_vnum_offset_p): Likewise.
* config/aarch64/predicates.md (aarch64_addsvl_addspl_immediate):
New predicate.
(aarch64_pluslong_operand): Include it for SME.
* config/aarch64/constraints.md (Ucj, Uav): New constraints.
* config/aarch64/iterators.md (VNx1TI_ONLY): New mode iterator.
(SME_ZA_I, SME_ZA_SDI, SME_ZA_SDF_I, SME_MOP_BHI): Likewise.
(SME_MOP_HSDF): Likewise.
(UNSPEC_SME_ADDHA, UNSPEC_SME_ADDVA, UNSPEC_SME_FMOPA)
(UNSPEC_SME_FMOPS, UNSPEC_SME_LD1_HOR, UNSPEC_SME_LD1_VER)
(UNSPEC_SME_READ_HOR, UNSPEC_SME_READ_VER, UNSPEC_SME_SMOPA)
(UNSPEC_SME_SMOPS, UNSPEC_SME_ST1_HOR, UNSPEC_SME_ST1_VER)
(UNSPEC_SME_SUMOPA, UNSPEC_SME_SUMOPS, UNSPEC_SME_UMOPA)
(UNSPEC_SME_UMOPS, UNSPEC_SME_USMOPA, UNSPEC_SME_USMOPS)
(UNSPEC_SME_WRITE_HOR, UNSPEC_SME_WRITE_VER): New unspecs.
(elem_bits): Handle x2 and x4 structure modes, plus VNx1TI.
(Vetype, Vesize, VPRED): Handle VNx1TI.
(b): New mode attribute.
(SME_LD1, SME_READ, SME_ST1, SME_WRITE, SME_BINARY_SDI, SME_INT_MOP)
(SME_FP_MOP): New int iterators.
(optab): Handle SME unspecs.
(hv): New int attribute.
* config/aarch64/aarch64.md (*add<mode>3_aarch64): Handle ADDSVL
and ADDSPL.
* config/aarch64/aarch64-sme.md (UNSPEC_SME_LDR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_ldr0, @aarch64_sme_ldrn<mode>): New patterns.
(UNSPEC_SME_STR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_str0, @aarch64_sme_strn<mode>): New patterns.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(UNSPEC_SME_ZERO): New unspec.
(aarch64_sme_zero): New pattern.
(@aarch64_sme_<SME_BINARY_SDI:optab><mode>): Likewise.
(@aarch64_sme_<SME_INT_MOP:optab><mode>): Likewise.
(@aarch64_sme_<SME_FP_MOP:optab><mode>): Likewise.
* config/aarch64/aarch64-sve-builtins.def: Add ZA type suffixes.
Include aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION): New macro.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZA): New call
property.
(CP_WRITE_ZA): Likewise.
(PRED_za_m): New predication type.
(type_suffix_index): Handle DEF_SME_ZA_SUFFIX.
(type_suffix_info): Add vector_p and za_p fields.
(function_instance::num_za_tiles): New member function.
(function_builder::get_attributes): Add an aarch64_feature_flags
argument.
(function_expander::get_contiguous_base): Take a base argument
number, a vnum argument number, and an argument that indicates
whether the vnum parameter is a factor of the SME vector length
or the prevailing vector length.
(function_expander::add_integer_operand): Take a poly_int64.
(sve_switcher::sve_switcher): Take a base set of flags.
(sme_switcher): New class.
(scalar_types): Add a null entry for NUM_VECTOR_TYPES.
* config/aarch64/aarch64-sve-builtins.cc: Include
aarch64-sve-builtins-sme.h.
(pred_suffixes): Add an entry for PRED_za_m.
(type_suffixes): Initialize vector_p and za_p. Handle ZA suffixes.
(TYPES_all_za, TYPES_d_za, TYPES_za_bhsd_data, TYPES_za_all_data)
(TYPES_za_s_integer, TYPES_za_d_integer, TYPES_mop_base)
(TYPES_mop_base_signed, TYPES_mop_base_unsigned, TYPES_mop_i16i64)
(TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned, TYPES_za): New
type suffix macros.
(preds_m, preds_za_m): New predication lists.
(function_groups): Handle DEF_SME_ZA_FUNCTION.
(scalar_types): Add an entry for NUM_VECTOR_TYPES.
(find_type_suffix_for_scalar_type): Check positively for vectors
rather than negatively for predicates.
(check_required_extensions): Handle PSTATE.SM and PSTATE.ZA
requirements.
(report_out_of_range): Handle the case where the minimum and
maximum are the same.
(function_instance::reads_global_state_p): Return true for functions
that read ZA.
(function_instance::modifies_global_state_p): Return true for functions
that write to ZA.
(sve_switcher::sve_switcher): Add a base flags argument.
(function_builder::get_name): Handle "__arm_" prefixes.
(add_attribute): Add an overload that takes a namespaces.
(add_shared_state_attribute): New function.
(function_builder::get_attributes): Take the required feature flags
as argument. Add streaming and ZA attributes where appropriate.
(function_builder::add_unique_function): Update calls accordingly.
(function_resolver::check_gp_argument): Assert that the predication
isn't ZA _m predication.
(function_checker::function_checker): Don't bias the argument
number for ZA _m predication.
(function_expander::get_contiguous_base): Add arguments that
specify the base argument number, the vnum argument number,
and an argument that indicates whether the vnum parameter is
a factor of the SME vector length or the prevailing vector length.
Handle the SME case.
(function_expander::add_input_operand): Handle pmode_register_operand.
(function_expander::add_integer_operand): Take a poly_int64.
(init_builtins): Call handle_arm_sme_h for LTO.
(handle_arm_sve_h): Skip SME intrinsics.
(handle_arm_sme_h): New function.
* config/aarch64/aarch64-sve-builtins-functions.h
(read_write_za, write_za): New classes.
(unspec_based_sme_function, za_arith_function): New using aliases.
(quiet_za_arith_function): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_za_int_m, binary_za_m, binary_za_uint_m, bool_inherent)
(inherent_za, inherent_mask_za, ldr_za, load_za, read_za_m, store_za)
(str_za, unary_za_m, write_za_m): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
Expect za_m functions to have an existing governing predicate.
(binary_za_m_base, binary_za_int_m_def, binary_za_m_def): New classes.
(binary_za_uint_m_def, bool_inherent_def, inherent_za_def): Likewise.
(inherent_mask_za_def, ldr_za_def, load_za_def, read_za_m_def)
(store_za_def, str_za_def, unary_za_m_def, write_za_m_def): Likewise.
* config/aarch64/arm_sme.h: New file.
* config/aarch64/aarch64-sve-builtins-sme.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): Depend on
aarch64-sve-builtins-sme.def and aarch64-sve-builtins-sme.h.
(aarch64-sve-builtins-sme.o): New rule.
gcc/testsuite/
* lib/target-supports.exp: Add sme and sme-i16i64 features.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test __ARM_FEATURE_SME*
macros.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Allow functions
to be marked as __arm_streaming, __arm_streaming_compatible, and
__arm_inout("za").
* g++.target/aarch64/sve/acle/general-c++/func_redef_4.c: Mark the
function as __arm_streaming_compatible.
* g++.target/aarch64/sve/acle/general-c++/func_redef_5.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/func_redef_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_5.c: Likewise.
* g++.target/aarch64/sme/aarch64-sme-acle-asm.exp: New test harness.
* gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c: Likewise.
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.
That rule began to break down in SVE2, and it continues to do
so in SME. This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.
gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_shape::has_merge_argument_p): New member function.
* config/aarch64/aarch64-sve-builtins.cc:
(function_resolver::check_gp_argument): Use it.
(function_expander::get_fallback_value): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(apply_predication): Likewise.
(unary_convert_narrowt_def::has_merge_argument_p): New function.
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.
gcc/
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Allow type suffix 1 to determine
the mode of the operation.
(unspec_based_function): Update accordingly.
(unspec_based_fused_function): Likewise.
(unspec_based_fused_lane_function): Likewise.
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others. It's purely an RTL
convenience and isn't (yet) a valid storage mode.
SME has an array called ZA that can be enabled and disabled separately
from streaming mode. A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.
In C and C++, the state of PSTATE.ZA is controlled using function
attributes. There are four attributes that can be attached to
function types to indicate that the function shares ZA with its
caller. These are:
If a function's type has one of these shared-ZA attributes,
PSTATE.ZA is specified to be 1 on entry to the function and on return
from the function. Otherwise, the caller and callee have separate
ZA contexts; they do not use ZA to share data.
Although normal non-shared-ZA functions have a separate ZA context
from their callers, nested uses of ZA are expected to be rare.
The ABI therefore defines a cooperative lazy saving scheme that
allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)
Functions that want to use ZA internally have an arm::new("za")
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body. It also tells the compiler
to commit any lazy save initiated by a caller.
The patch uses various abstract hard registers to track dataflow
relating to ZA. See the comments in the patch for details.
The lazy save scheme is intended to be transparent to most normal
functions, so that they don't need to be recompiled for SME.
This is reflected in the way that most normal functions ignore
the new hard registers added in the patch.
As with arm::streaming and arm::streaming_compatible, the attributes are
also available as __arm_<attr>. This has two advantages: it triggers an
error on compilers that don't understand the attributes, and it eases
use on C, where [[...]] attributes were only added in C23.
gcc/
* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
(aarch64_output_rdsvl, aarch64_optimize_mode_switching)
(aarch64_restore_za): Declare.
* config/aarch64/constraints.md (UsR): New constraint.
* config/aarch64/aarch64.md (LOWERING_REGNUM, TPIDR_BLOCK_REGNUM)
(SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, ZA_FREE_REGNUM)
(ZA_SAVED_REGNUM, ZA_REGNUM, FIRST_FAKE_REGNUM): New constants.
(LAST_FAKE_REGNUM): Likewise.
(UNSPEC_SAVE_NZCV, UNSPEC_RESTORE_NZCV, UNSPEC_SME_VQ): New unspecs.
(arches): Add sme.
(arch_enabled): Handle it.
(*cb<optab><mode>1): Rename to...
(aarch64_cb<optab><mode>1): ...this.
(*movsi_aarch64): Add an alternative for RDSVL.
(*movdi_aarch64): Likewise.
(aarch64_save_nzcv, aarch64_restore_nzcv): New insns.
* config/aarch64/aarch64-sme.md (UNSPEC_SMSTOP_ZA)
(UNSPEC_INITIAL_ZERO_ZA, UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE)
(UNSPEC_READ_TPIDR2, UNSPEC_WRITE_TPIDR2, UNSPEC_SETUP_LOCAL_TPIDR2)
(UNSPEC_RESTORE_ZA, UNSPEC_START_PRIVATE_ZA_CALL): New unspecs.
(UNSPEC_END_PRIVATE_ZA_CALL, UNSPEC_COMMIT_LAZY_SAVE): Likewise.
(UNSPECV_ASM_UPDATE_ZA): New unspecv.
(aarch64_tpidr2_save, aarch64_smstart_za, aarch64_smstop_za)
(aarch64_initial_zero_za, aarch64_setup_local_tpidr2)
(aarch64_clear_tpidr2, aarch64_write_tpidr2, aarch64_read_tpidr2)
(aarch64_tpidr2_restore, aarch64_restore_za, aarch64_asm_update_za)
(aarch64_start_private_za_call, aarch64_end_private_za_call)
(aarch64_commit_lazy_save): New patterns.
* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
(FIXED_REGISTERS, REGISTER_NAMES): Add the new fake ZA registers.
(CALL_USED_REGISTERS): Replace with...
(CALL_REALLY_USED_REGISTERS): ...this and add the fake ZA registers.
(FIRST_PSEUDO_REGISTER): Bump to include the fake ZA registers.
(FAKE_REGS): New register class.
(REG_CLASS_NAMES): Update accordingly.
(REG_CLASS_CONTENTS): Likewise.
(machine_function::tpidr2_block): New member variable.
(machine_function::tpidr2_block_ptr): Likewise.
(machine_function::za_save_buffer): Likewise.
(machine_function::next_asm_update_za_id): Likewise.
(CUMULATIVE_ARGS::shared_za_flags): Likewise.
(aarch64_mode_entity, aarch64_local_sme_state): New enums.
(aarch64_tristate_mode): Likewise.
(OPTIMIZE_MODE_SWITCHING, NUM_MODES_FOR_MODE_SWITCHING): Define.
* config/aarch64/aarch64.cc (AARCH64_STATE_SHARED, AARCH64_STATE_IN)
(AARCH64_STATE_OUT): New constants.
(aarch64_attribute_shared_state_flags): New function.
(aarch64_lookup_shared_state_flags, aarch64_fndecl_has_new_state)
(aarch64_check_state_string, cmp_string_csts): Likewise.
(aarch64_merge_string_arguments, aarch64_check_arm_new_against_type)
(handle_arm_new, handle_arm_shared): Likewise.
(handle_arm_new_za_attribute): New
(aarch64_arm_attribute_table): Add new, preserves, in, out, and inout.
(aarch64_hard_regno_nregs): Handle FAKE_REGS.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_fntype_shared_flags, aarch64_fntype_pstate_za): New functions.
(aarch64_fntype_isa_mode): Include aarch64_fntype_pstate_za.
(aarch64_fndecl_has_state, aarch64_fndecl_pstate_za): New functions.
(aarch64_fndecl_isa_mode): Include aarch64_fndecl_pstate_za.
(aarch64_cfun_incoming_pstate_za, aarch64_cfun_shared_flags)
(aarch64_cfun_has_new_state, aarch64_cfun_has_state): New functions.
(aarch64_sme_vq_immediate, aarch64_sme_vq_unspec_p): Likewise.
(aarch64_rdsvl_immediate_p, aarch64_output_rdsvl): Likewise.
(aarch64_expand_mov_immediate): Handle RDSVL immediates.
(aarch64_function_arg): Add the ZA sharing flags as a third limb
of the PARALLEL.
(aarch64_init_cumulative_args): Record the ZA sharing flags.
(aarch64_extra_live_on_entry): New function. Handle the new
ZA-related fake registers.
(aarch64_epilogue_uses): Handle the new ZA-related fake registers.
(aarch64_cannot_force_const_mem): Handle UNSPEC_SME_VQ constants.
(aarch64_get_tpidr2_block, aarch64_get_tpidr2_ptr): New functions.
(aarch64_init_tpidr2_block, aarch64_restore_za): Likewise.
(aarch64_layout_frame): Check whether the current function creates
new ZA state. Record that it clobbers LR if so.
(aarch64_expand_prologue): Handle functions that create new ZA state.
(aarch64_expand_epilogue): Likewise.
(aarch64_create_tpidr2_block): New function.
(aarch64_restore_za): Likewise.
(aarch64_start_call_args): Disallow calls to shared-ZA functions
from functions that have no ZA state. Emit a marker instruction
before calls to private-ZA functions from functions that have
SME state.
(aarch64_expand_call): Add return registers for state that is
managed via attributes. Record the use and clobber information
for the ZA registers.
(aarch64_end_call_args): New function.
(aarch64_regno_regclass): Handle FAKE_REGS.
(aarch64_class_max_nregs): Likewise.
(aarch64_override_options_internal): Require TARGET_SME for
functions that have ZA state.
(aarch64_conditional_register_usage): Handle FAKE_REGS.
(aarch64_mov_operand_p): Handle RDSVL immediates.
(aarch64_comp_type_attributes): Check that the ZA sharing flags
are equal.
(aarch64_merge_decl_attributes): New function.
(aarch64_optimize_mode_switching, aarch64_mode_emit_za_save_buffer)
(aarch64_mode_emit_local_sme_state, aarch64_mode_emit): Likewise.
(aarch64_insn_references_sme_state_p): Likewise.
(aarch64_mode_needed_local_sme_state): Likewise.
(aarch64_mode_needed_za_save_buffer, aarch64_mode_needed): Likewise.
(aarch64_mode_after_local_sme_state, aarch64_mode_after): Likewise.
(aarch64_local_sme_confluence, aarch64_mode_confluence): Likewise.
(aarch64_one_shot_backprop, aarch64_local_sme_backprop): Likewise.
(aarch64_mode_backprop, aarch64_mode_entry): Likewise.
(aarch64_mode_exit, aarch64_mode_eh_handler): Likewise.
(aarch64_mode_priority, aarch64_md_asm_adjust): Likewise.
(TARGET_END_CALL_ARGS, TARGET_MERGE_DECL_ATTRIBUTES): Define.
(TARGET_MODE_EMIT, TARGET_MODE_NEEDED, TARGET_MODE_AFTER): Likewise.
(TARGET_MODE_CONFLUENCE, TARGET_MODE_BACKPROP): Likewise.
(TARGET_MODE_ENTRY, TARGET_MODE_EXIT): Likewise.
(TARGET_MODE_EH_HANDLER, TARGET_MODE_PRIORITY): Likewise.
(TARGET_EXTRA_LIVE_ON_ENTRY): Likewise.
(TARGET_MD_ASM_ADJUST): Use aarch64_md_asm_adjust.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_new, __arm_preserves,__arm_in, __arm_out, and __arm_inout.
This patch adds support for switching to the appropriate SME mode
for each call. Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction. If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.
Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion. (It doesn't use md_reorg because later additions
need the CFG.)
If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards. The old mode must
therefore be available immediately after the call. The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.
Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.
gcc/
* config/aarch64/aarch64-passes.def
(pass_late_thread_prologue_and_epilogue): New pass.
* config/aarch64/aarch64-sme.md: New file.
* config/aarch64/aarch64.md: Include it.
(*tb<optab><mode>1): Rename to...
(@aarch64_tb<optab><mode>): ...this.
(call, call_value, sibcall, sibcall_value): Don't require operand 2
to be a CONST_INT.
* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
the insn.
(make_pass_switch_sm_state): Declare.
* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
(CALL_USED_REGISTER): Mark VG as call-preserved.
(aarch64_frame::old_svcr_offset): New member variable.
(machine_function::call_switches_sm_state): Likewise.
(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
(aarch64_cfun_incoming_pstate_sm): New function.
(aarch64_call_switches_pstate_sm): Likewise.
(aarch64_reg_save_mode): Return DImode for VG_REGNUM.
(aarch64_callee_isa_mode): New function.
(aarch64_insn_callee_isa_mode): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_switch_pstate_sm): Likewise.
(aarch64_sme_mode_switch_regs): New class.
(aarch64_record_sme_mode_switch_args): New function.
(aarch64_finish_sme_mode_switch_args): Likewise.
(aarch64_function_arg): Handle the end marker by returning a
PARALLEL that contains the ABI cookie that we used previously
alongside the result of aarch64_finish_sme_mode_switch_args.
(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
(aarch64_function_arg_advance): If a call would switch SM state,
record all argument registers that would need to be saved around
the mode switch.
(aarch64_need_old_pstate_sm): New function.
(aarch64_layout_frame): Decide whether the frame needs to store the
incoming value of PSTATE.SM and allocate a save slot for it if so.
If a function switches SME state, arrange to save the old value
of the DWARF VG register. Handle the case where this is the only
register save slot above the FP.
(aarch64_save_callee_saves): Handles saves of the DWARF VG register.
(aarch64_get_separate_components): Prevent such saves from being
shrink-wrapped.
(aarch64_old_svcr_mem): New function.
(aarch64_read_old_svcr): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_expand_prologue): Handle saves of the DWARF VG register.
Initialize any SVCR save slot.
(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
both the UNSPEC_CALLEE_ABI value and a list of registers that need
to be preserved across a change to PSTATE.SM. If the call does
involve such a change to PSTATE.SM, record the registers that
would be clobbered by this process. Also emit an instruction
to mark the temporary change in VG. Update call_switches_pstate_sm.
(aarch64_emit_call_insn): Return the emitted instruction.
(aarch64_frame_pointer_required): New function.
(aarch64_conditional_register_usage): Prevent VG_REGNUM from being
treated as a register operand.
(aarch64_switch_pstate_sm_for_call): New function.
(pass_data_switch_pstate_sm): New pass variable.
(pass_switch_pstate_sm): New pass class.
(make_pass_switch_pstate_sm): New function.
(TARGET_FRAME_POINTER_REQUIRED): Define.
* config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.