git.ipfire.org Git - thirdparty/gcc.git/log

Support Intel MOVRS

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Expand for
prefetchrst2.
* common/config/i386/cpuinfo.h (get_available_features): Detect movrs.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_MOVRS_SET): New.
(OPTION_MASK_ISA2_MOVRS_UNSET): Ditto.
(ix86_handle_option): Handle -mmovrs.
* common/config/i386/i386-cpuinfo.h
(enum processor_features): Add FEATURE_MOVRS.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for movrs.
* config.gcc: Add movrsintrin.h
* config/i386/cpuid.h (bit_MOVRS): New.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (CHAR, PCCHAR), (SHORT, PCSHORT), (INT, PCINT),
(INT64, PCINT64).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add
__MOVRS__.
* config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Define
__MOVRS__.
* config/i386/i386-isa.def (MOVRS): Add DEF_PTA(MOVRS)
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle movrs.
* config/i386/i386.md (movrs<mode>): New.
* config/i386/i386.opt: Add option -mmovrs.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include movrsintrin.h
* config/i386/sse.md (unspecv): Add UNSPEC_VMOVRS.
(VI1248_AVX10_2): New.
(avx10_2_movrs_vmovrs<ssemodesuffix><mode><mask_name>): New define_insn.
* config/i386/xmmintrin.h: Add prefetchrst2.
* doc/extend.texi: Document movrs.
* doc/invoke.texi: Document -mmovrs.
* doc/rtl.texi: Document extension of prefetchrst2.
* doc/sourcebuild.texi: Document target movrs.
* config/i386/movrsintrin.h: New.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mmovrs.
* g++.dg/other/i386-3.C: Ditto.
* gcc.c-torture/execute/builtin-prefetch-1.c: Expand rws.
* gcc.dg/builtin-prefetch-1.c: Ditto.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mmovrs.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add movrs.
* gcc.target/i386/sse-23.c: Ditto
* gcc.target/i386/avx10_2-512-movrs-1.c: New test.
* gcc.target/i386/avx10_2-movrs-1.c: Ditto.
* gcc.target/i386/movrs-1.c: Ditto.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>

Support Intel AMX-FP8

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_available_features): Detect amx-fp8.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_FP8_SET): New macros.
(OPTION_MASK_ISA2_AMX_FP8_UNSET): Ditto.
(ix86_handle_option): Handle -mamx-fp8.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_FP8.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-fp8.
* config.gcc: Add amxfp8intrin.h.
* config/i386/cpuid.h (bit_AMX_FP8): New.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Define __AMX_FP8__.
* config/i386/i386-isa.def (AMX_FP8): Add DEF_PTA for AMX_FP8.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Add new ATTR.
* config/i386/i386.opt: Add -mamx-fp8.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxfp8intrin.h.
* doc/extend.texi: Document -mamx-fp8.
* doc/invoke.texi: Document -mamx-fp8.
* doc/sourcebuild.texi: Document -mamx-fp8.
* config/i386/amxfp8intrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-fp8.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Check for amx-fp8.
* gcc.target/i386/amx-helper.h: Ditto.
* gcc.target/i386/fp8-helper.h: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-fp8.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: New proc.
* gcc.target/i386/amxfp8-asmatt-1.c: New test.
* gcc.target/i386/amxfp8-asmintel-1.c: Ditto.
* gcc.target/i386/amxfp8-dpbf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dpbhf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dphbf8ps-2.c: Ditto.
* gcc.target/i386/amxfp8-dphf8ps-2.c: Ditto.
* gcc.target/i386/fp-emulation.h: Emulates NaN behaviour.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

Support Intel AMX-TRANSPOSE

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-TRANSPOSE.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_TRANSPOSE_SET,
OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET): New.
(ix86_handle_option): Handle -mamx-transpose.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_TRANSPOSE.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-transpose.
* config.gcc: Add amxtransposeintrin.h.
* config/i386/cpuid.h (bit_AMX_TRANSPOSE): New.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AMX_TRANSPOSE__.
* config/i386/i386-isa.def (AMX_TRANSPOSE): Add
DEF_PTA(AMX_TRANSPOSE).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-transpose.
* config/i386/i386.opt: Add option -mamx-transpose.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxtransposeintrin.h.
* doc/extend.texi: Document amx-transpose.
* doc/invoke.texi: Document -mamx-transpose.
* doc/sourcebuild.texi: Document target amx-transpose.
* config/i386/amxtransposeintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-transpose.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add new check for amx-transpose.
(__tilepair): New.
(zero_pair_tile_src): New.
(check_pair_tile_register): New.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/amx-helper.h: Add amx-transpose support.
(init_pair_tile_src): New function.
* gcc.target/i386/sse-12.c: Add -mamx-tranpose.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-transpose.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_transposed): New.
* gcc.target/i386/amxtranspose-asmatt-1.c: New test.
* gcc.target/i386/amxtranspose-asmintel-1.c: Ditto.
* gcc.target/i386/amxtranspose-2rpntlvw-2.c: Ditto.
* gcc.target/i386/amxtranspose-conjtcmmimfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-conjtfp16-2.c: Ditto.
* gcc.target/i386/amxtranspose-tcmmimfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tcmmrlfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tdpbf16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tdpfp16ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-tmmultf32ps-2.c: Ditto.
* gcc.target/i386/amxtranspose-transposed-2.c: Ditto.

Support Intel AMX-TF32

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-TF32.
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_TF32_SET,
OPTION_MASK_ISA2_AMX_TF32_UNSET): New.
(ix86_handle_option): Handle -mamx-tf32.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_TF32.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-tf32.
* config.gcc: Add amxtf32intrin.h
* config/i386/cpuid.h (bit_AMX_TF32): New.
* config/i386/i386-c.cc (ix86_target_macros_internal): Handle amx-tf32.
* config/i386/i386-isa.def (AMX_TF32): Add DEF_PTA(AMX_TF32).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-tf32.
* config/i386/i386.opt: Add option -mamx-tf32.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxtf32intrin.h.
* doc/extend.texi: Document amx-tf32.
* doc/invoke.texi: Document -mamx-tf32.
* doc/sourcebuild.texi: Document target amx-tf32.
* config/i386/amxtf32intrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-tf32.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add cpu check for AMX-TF32.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-tf32.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-tf32.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_tf32): New.
* gcc.target/i386/amx-helper.h: New file for tf32 support.
* gcc.target/i386/amxtf32-asmatt-1.c: New test.
* gcc.target/i386/amxtf32-asmintel-1.c: Ditto.
* gcc.target/i386/amxtf32-mmultf32ps-2.c: Ditto.

Support Intel AMX-AVX512

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-AVX512.
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_AVX512_SET,
OPTION_MASK_ISA2_AMX_AVX512_UNSET): New.
(ix86_handle_option): Handle -mamx-avx512.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_AVX512.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-avx512.
* config.gcc: Add amxavx512intrin.h
* config/i386/cpuid.h (bit_AMX_AVX512): New.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Handle amx-avx512.
* config/i386/i386-isa.def (AMX_AVX512): Add DEF_PTA(AMX_AVX512).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-avx512.
* config/i386/i386.opt: Add option -mamx-avx512.
* config/i386/i386.opt.urls: Regenerated.
* config/i386/immintrin.h: Include amxavx512intrin.h
* doc/extend.texi: Document amx-avx512.
* doc/invoke.texi: Document -mamx-avx512.
* doc/sourcebuild.texi: Document target amx-avx512.
* config/i386/amxavx512intrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-avx512.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add cpu check for AMX-AVX512.
* gcc.target/i386/amx-helper.h: Support amx-avx512.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-avx512.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-avx512.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_avx512): New.
* gcc.target/i386/amxavx512-asmatt-1.c: New test.
* gcc.target/i386/amxavx512-asmintel-1.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowd2ps-2.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowps2pbf16-2.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowps2ph-2.c: Ditto.
* gcc.target/i386/amxavx512-movrow-2.c: Ditto.

Co-authored-by: Yu, Bing <bing1.yu@intel.com>

Support Intel SM4 EVEX instructions

gcc/ChangeLog:

* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
V16SI_FTYPE_V16SI_V16SI.
* config/i386/sm4intrin.h: Add zmm insns.
* config/i386/sse.md (vsm4key4_<mode>): Add EVEX pattern.
(vsm4rnds4_<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sm4-check.h: Add zmm test.
* gcc.target/i386/sm4-avx10_2-1.c: New test.
* gcc.target/i386/sm4-avx10_2-512-1.c: Ditto.
* gcc.target/i386/sm4key4-avx10_2-512-2.c: Ditto.
* gcc.target/i386/sm4rnds4-avx10_2-512-2.c: Ditto.

testsuite: g++.dg: debug: fix test filenames

gcc/testsuite/ChangeLog:
PR debug/15736
PR debug/46240

* g++.dg/debug/pr15736.cc: Move to...
* g++.dg/debug/pr15736.C: ...here.
* g++.dg/debug/pr46240.cc: Move to...
* g++.dg/debug/pr46240.C: ...here.

testsuite: g++.dg: torture: fix PR111520 filename

gcc/testsuite/ChangeLog:
PR tree-optimization/111520

* g++.dg/torture/harden-comp-pr111520.cc: Move to...
* g++.dg/torture/harden-comp-pr111520.C: ...here.

testsuite: g++.dg: fix PR90313 filename

gcc/testsuite/ChangeLog:
PR c++/90313

* g++.dg/torture/pr90313.cc: Move to...
* g++.dg/torture/pr90313.C: ...here.

testsuite: fixup pr66655.C

In r15-4823-g14e2f3233bf0ef, I renamed pr66655_1.cc but neglected
to update a dg-additional-sources reference.

gcc/testsuite/ChangeLog:
PR target/66655

* g++.dg/pr66655.C: Adjust filename in dg-additional-sources.

testsuite: g++.dg: rename pr66655 test

The test was being ignored because dg.exp looks for .C in g++.dg/.

gcc/testsuite/ChangeLog:
PR target/66655

* g++.dg/pr66655_1.cc: Move to...
* g++.dg/pr66655_1.C: ...here.

testsuite: g++.dg: rename pr42965 test

.c is used for C and .C is for C++. The test was being ignored before.

gcc/testsuite/ChangeLog:
PR other/42965

* g++.dg/warn/unused-result1-Werror.c: Move to...
* g++.dg/warn/unused-result1-Werror.C: ...here.

testsuite: g++.dg: rename pr105820 test

.c is used for C and .C is for C++. The test was being ignored before.

gcc/testsuite/ChangeLog:
PR tree-optimization/105820

* g++.dg/tree-ssa/pr105820.c: Move to...
* g++.dg/tree-ssa/pr105820.C: ...here.

testsuite: move single-file LTO pr47333 test to torture

This only started being used recently in r15-4683-g04e0fbbc34e101 and
pinskia pointed out we may as well make it a proper torture test
instead as it's a single file LTO test.

gcc/testsuite/ChangeLog:
PR target/47333

* g++.dg/lto/pr47333_0.C: Move to...
* g++.dg/torture/pr47333.C: ...here.

testsuite: move single-file LTO pr95677 test to torture

This only started being used recently in r15-4681-g96110c14cf61a1 and
pinskia pointed out we may as well make it a proper torture test
instead as it's a single file LTO test.

gcc/testsuite/ChangeLog:
PR c++/95677

* g++.dg/lto/pr95677_0.C: Move to...
* g++.dg/torture/pr95677.C: ...here.

libiberty: Fix comment typos

These comment typos were found in the valgrind fork of libiberty
demangle code.

libiberty/ChangeLog:

* cplus-dem.c: Change preceeded to preceded.

include/ChangeLog:

* safe-ctype.h: Change accidently to accidentally.

aarch64: Require SVE2 and/or SME2 for SVE FAMINMAX intrinsics

After the previous patch, we can now accurately model the ISA
requirements for the SVE FAMINMAX intrinsics.  They can be used
in non-streaming mode if TARGET_SVE2 and in streaming mode if
TARGET_SME2 (with both cases also requiring TARGET_FAMINMAX).
They can be used in streaming-compatible mode if TARGET_SVE2
&& TARGET_SME2.

Also, Kyrill pointed out in the original review of the FAMINMAX
support that it would be more consistent to define the rtl patterns
in aarch64-sve2.md rather than aarch64-sve.md, so the pushed patch
did that.  This patch moves the definitions of the intrinsics to
the sve2 files too, for consistency.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmax, svamin): Move
definitions to...
* config/aarch64/aarch64-sve-builtins-sve2.cc: ...here.
* config/aarch64/aarch64-sve-builtins-base.def (svmax, svamin): Move
definitions to...
* config/aarch64/aarch64-sve-builtins-sve2.def: ...here.  Require
SME2 in streaming mode.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/amin_1.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f16.c: Enabled sve2 and
(for streaming mode) sme2.
* gcc.target/aarch64/sve2/acle/asm/amax_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amax_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f64.c: Likewise.

aarch64: Record separate streaming and non-streaming ISA requirements

For some upcoming extensions, we need to add intrinsics whose
ISA requirements differ between streaming mode and non-streaming mode.
This patch tries to generalise the infrastructure to support that:

- Rather than have a single set of feature flags, the patch uses a
  separate set for sm_off (non-streaming, PSTATE.SM==0) and sm_on
  (streaming, PSTATE.SM==1).

- The sm_off set is zero if the intrinsic is streaming-only.
  Otherwise it is AARCH64_FL_SM_OFF | <requirements>.

- Similarly, the sm_on set is zero if the intrinsic is non-streaming-only.
  Otherwise it is AARCH64_FL_SM_ON | <requirements>.  AARCH64_FL_SME is
  taken as given in streaming mode.

- Streaming-compatible code must satisfy both sets of requirements.

There should be no functional change.

gcc/
* config.gcc (aarch64*-*-*): Add aarch64-protos.h to target_gtfiles.
* config/aarch64/aarch64-protos.h
(aarch64_required_extensions): New structure.
(aarch64_check_required_extensions): Change the type of the
required_extensions parameter from aarch64_feature_flags to
aarch64_required_extensions.
* config/aarch64/aarch64-sve-builtins.h
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
(function_builder::get_attributes): Likewise.
(function_builder::add_function): Likewise.
(function_group_info): Change the type of required_extensions
in the same way.
* config/aarch64/aarch64-builtins.cc
(aarch64_pragma_builtins_data::required_extensions): Change the type
from aarch64_feature_flags to aarch64_required_extensions.
(aarch64_check_required_extensions): Likewise change the type
of the required_extensions parameter.  Separate the requirements
for non-streaming mode and streaming mode, ORing them together
for streaming-compatible mode.
(aarch64_general_required_extensions): New function.
(aarch64_general_check_builtin_call): Use it.
* config/aarch64/aarch64-sve-builtins.cc
(registered_function::required_extensions): Change the type
from aarch64_feature_flags to aarch64_required_extensions.
(DEF_NEON_SVE_FUNCTION, DEF_SME_ZA_FUNCTION_GS): Update accordingly.
(function_builder::get_attributes): Change the type of the
required_extensions parameter from aarch64_feature_flags to
aarch64_required_extensions.
(function_builder::add_function): Likewise.
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
* config/aarch64/aarch64-simd-pragma-builtins.def: Update
REQUIRED_EXTENSIONS definitions to use aarch64_required_extensions.
* config/aarch64/aarch64-sve-builtins-base.def: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.

aarch64: Move ENTRY_VHSDF to aarch64-simd-pragma-builtins.def

It's more convenient for later patches if we only define ENTRY_VHSDF
once, in the .def file. Then the only macro that needs to be defined
before including the file is ENTRY itself.

The patch also moves the architecture requirements out of the
individual ENTRY invocations into a block-level definition of
REQUIRED_EXTENSIONS. This reduces cut-&-paste a little and makes
things more consistent with aarch64-sve-builtins*.def.

gcc/
* config/aarch64/aarch64-builtins.cc (ENTRY): Remove the features
argument and get the features from REQUIRED_EXTENSIONS instead.
(ENTRY_VHSDF): Move definition to...
* config/aarch64/aarch64-simd-pragma-builtins.def: ...here.
Move the architecture requirements to REQUIRED_EXTENSIONS.

aarch64: Forbid F64MM permutes in streaming mode

The current code was based on an early version of the SME spec,
which allowed the .Q forms of TRN1, TRN2, UZP1, UZP2, ZIP1, and ZIP2
to be used in streaming mode. We should now forbid them instead;
see https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/TRN1--TRN2--vectors---Interleave-even-or-odd-elements-from-two-vectors-?lang=en
and the corresponding entries for the others.

gcc/
* config/aarch64/aarch64-sve-builtins-base.def (svtrn1q, svtrn2q)
(svuzp1q, svuzp2q, svzip1q, svzip2q): Require SM_OFF.

gcc/testsuite/
* g++.target/aarch64/sve/aarch64-ssve.exp: Add tests for trn[12]q,
uzp[12].c, and zip[12]q.
* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Skip for
STREAMING_COMPATIBLE.
* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise.

testsuite: fix c23-constexpr-2a.c test to use dg-do run

The comment at the top of the test indicates it should be an execution test,
but it was only using 'dg-do link'. Correct that.

The only change in test results is as expected:
```
+PASS: gcc.dg/c23-constexpr-2a.c execution test
```

gcc/testsuite/ChangeLog:
PR testsuite/117183

* gcc.dg/c23-constexpr-2a.c: Use dg-do run.

c: detect variably-modified types [PR117145,PR117245,PR100420]

This fixes two cases where variably-modified types were not recognized as
such.  The first is when building composite types and the other when a type
is reconstructed for the 'vector' attribute.  Construction of types in
the C FE is reorganized to use c_build_* functions which are responsible for
setting C_TYPE_VARIABLE_SIZE, C_TYPE_VARIABLY_MODIFIED and TYPE_TYPELESS_STORAGE
based on the properties of the type itself and these replace all other logic
elsewhere (e.g. in grokdeclarator).  A new 'c_reconstruct_complex_type' based
on these functions is introduced which is called via a language hook when the
'vector' attribute is processed (as for C++).

One problem is are arrays of unspecified size 'T[*]' which were represented
identically to zero-sized arrays but with C_TYPE_VARIABLE_SIZE set.  To avoid
having to create distinct type copies for this, the representation was changed
to make it a natural VLA by giving it an upper bound of '(0, 0)'.  This also
then allows fixing of PR100420 where such arrays were printed as 'T[0]'.

Finally, a new function 'c_verify_type' checks consistency of properties
specific to C FE and is called when checking is on.

PR c/117145
PR c/117245
PR c/100420

gcc/c/ChangeLog:
* c-decl.cc (c_build_pointer_type): Move to c-typeck.cc
(grokdeclarator): Simplify logic.
(match_builtin_function_types): Adapt.
(push_decl): Adapt.
(implicitly_declare): Adapt.
(c_update_type_canonical): Adapt.
(c_make_fname_decl): Adapt.
(start_function): Adapt.
* c-objc-common.h: Add LANG_HOOKS_RECONSTRUCT_COMPLEX_TYPE.
* c-tree.h: Add prototypes.
* c-typeck.cc (c_verify_type): New function.
(c_set_type_bits). New function.
(c_build_pointer_type): Moved from c-decl.cc.
(c_build_pointer_type_for_mode): New function.
(c_build_function_type): New function.
(c_build_array_type): New function.
(c_build_type_attribute_variant): New function.
(c_reconstruct_complex_type): New function.
(c_build_functype_attribute_variant): Renamed.
(array_to_pointer_conversion): Simplify logic.
(composite_type_internal): Simplify logic..
(build_unary_op): Simplify logic..
(comptypes_verify): Add checking assertions.
(c_build_qualified_type): Add checking assertions.
(c_build_function_call_vec): Adapt.
(qualify_type): Adapt.
(build_functype_attribute_variant): Adapt.
(common_pointer_type): Adapt.
(c_common_type): Adapt.
(convert_for_assignment): Adapt.
(type_or_builtin_type): Adapt.
(build_access_with_size_for_counted_by): Adapt.
(build_conditional_expr): Adapt.
(build_modify_expr): Adapt.
(build_binary_op): Adapt.
(build_omp_array_section): Adapt.
(handle_omp_array_sections): Adapt.
(c_finish_omp_clauses): Adapt.
* c-parser.cc (c_parser_typeof_specifier): Adapt.
(c_parser_generic_selection): Adapt.

gcc/c-family/ChangeLog:
* c-pretty-print.cc (c_pretty_printer::direct_abstract_declarator):
Detect arrays of unspecified size.

gcc/testsuite/ChangeLog:
* gcc.dg/c23-tag-composite-11.c: New test.
* gcc.dg/Warray-parameter-4.c: Resolve xfails.
* gcc.dg/Wvla-parameter-2.c: Resolve xfails.
* gcc.dg/Wvla-parameter-3.c: Resolve xfails.
* gcc.dg/pr117145-1.c: New test.
* gcc.dg/pr117145-2.c: New test.
* gcc.dg/pr117245.c: New test.

testsuite: Fix prototype in gcc.dg/pr114115.c

One test failing with a -std=gnu23 default that I wanted to
investigate further is gcc.dg/pr114115.c. Building with -std=gnu23
produces a warning:

pr114115.c:18:8: warning: 'ifunc' resolver for 'foo_ifunc2' should return 'void * (*)(void)' [-Wattribute-alias=]

It turns out that this warning (from cgraphunit.cc) is disabled for
unprototyped functions. Fix the return type for foo_ifunc2 so the
test builds without warnings both with and without -std=gnu23.

Tested for x86_64.

* gcc.dg/pr114115.c (foo_ifunc2): Return void.

Add autoconf check for clock_gettime

Reported by Andrew Stubbs

gcc/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for HAVE_CLOCK_GETTIME.
* timevar.cc (get_time): Use HAVE_CLOCK_GETTIME.

testsuite: Use noinline in gcc.dg/simulate-thread/simulate-thread.h

Among the changes of test results with a -std=gnu23 default were two
tests changing from PASS to UNSUPPORTED:

UNSUPPORTED: gcc.dg/simulate-thread/speculative-store.c   -O2 -g  thread simulation test
UNSUPPORTED: gcc.dg/simulate-thread/speculative-store.c   -O3 -g  thread simulation test

It appears that functions defined with () becoming prototyped affects
inlining, and changing the code to use (void) allows UNSUPPORTED
results to be reproduced with -std=gnu17.  Add __attribute__
((noinline)) on one more function to avoid the UNSUPPORTED results;
some of the tests in this directory already have such an attribute on
some functions.

Tested for x86_64-pc-linux-gnu.

* gcc.dg/simulate-thread/simulate-thread.h
(simulate_thread_wrapper_final_verify): Mark noinline.

RISC-V: fix const interleaved stepped vector with a scalar pattern

When bisecting for ICE in PR/117353, commit 771256bcb9dd ("RISC-V: Emit costs for
bool and stepped const vectors") uncovered yet another latent issue (first noted [1])

  [1] https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1625

This patch fixes some of the fortran regressions from that report.

Fixes 71a5ac6703d1 ("RISC-V: Support interleave vector with different step sequence")

rv64imafdcv_zvl256b_zba_zbb_zbs_zicond/lp64d/medlow
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
                            |  392 /   108 |    7 /     3 |   91 /    24 |
                            |  392 /   108 |    7 /     3 |   67 /    12 |

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use IOR op.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/slp-interleave-5.c: New test.

Tested-by: Edwin Lu <ewlu@rivosinc.com> # Pre-commit CU #2503
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

diagnostics: add class lazy_diagnostic_path

This patch adds a new class lazy_diagnostic_path for
use when creating rich_location instances, to allow deferring
expensive computations until the path is actually used (when
a diagnostic using the rich_location is emitted).

gcc/ChangeLog:
* Makefile.in (OBJS): Add lazy-diagnostic-path.o.
* lazy-diagnostic-path.cc: New file.
* lazy-diagnostic-path.h: New file.
* selftest-diagnostic.cc: Include "diagnostic-format.h".
(test_diagnostic_context::test_diagnostic_context): Turn off
flushing for the output format's printer.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::lazy_diagnostic_path_cc_tests.
* selftest.h (selftest::lazy_diagnostic_path_cc_tests): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: use std::move in output_factory::handler ctor

gcc/ChangeLog:
* opts-diagnostic.cc (output_factory::handler::handler): Use
std::move on name.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix memory leak of m_option_mgr

Fix some noise seen in "make selftest-valgrind".

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::finish): Delete and reset
m_option_mgr.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[PATCH v2] RISC-V: Fix gcc.target/riscv/rvv/base/cpymem-1.c f3

The function body checks for f3 only ran with -mcmodel explicitly set
which meant I missed a regression in my local testing of:

  commit b039d06c9a810a3fab4c5eb9d50b0c7aff94b2d8
  Author: Craig Blackmore <craig.blackmore@embecosm.com>
  Date:   Fri Oct 18 09:17:21 2024 -0600

      [PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation

The failure showed up in the rivos CI and it is due to f3 now using
LMUL m1 instead of m8.

I have reworked the test to make it more robust and maintainable.  This
allowed most of the special casing of command line arguments to be
removed.  It also fixes an issue where some targets would enable
multiple versions of the function body check e.g. `-march=rv32gcv
-mcmodel=medany`.

Changes since v1: Added missing ChangeLog.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-1.c: Fix and rework f3.

testsuite: add testcase for fixed PR106073

This was fixed by r12-8835-ge8d5f3a1b5a583 which surely made it latent
but richi points out it was likely an instance of PR90348. -fstack-reuse
continues to be a menace, so let's add the testcase.

gcc/testsuite/ChangeLog:
PR middle-end/90348
PR tree-optimization/106073

* gcc.dg/pr106073.c: New test.

middle-end: Lower all gconds during vector pattern matching [PR117176]

I have been taking a look at boolean handing once more in the vectorizer.

There are two situation to consider:

  1. when the boolean being created are created from comparing data inputs then
     for the resulting vector boolean we need to know the vector type and the
     precision.  In this case, when we have an operation such as NOT on the data
     element, this has to be lowered to XOR because the truncation to the vector
     precision needs to be explicit.
  2. when the boolean being created comes from another boolean operation, then
     we don't need to lower NOT, as the precision doesn't change.  We don't do
     any lowering for these (as denoted in check_bool_pattern) and instead the
     precision is copied from the element feeding the boolean statement during
     VF analysis.

For early break gcond lowering in order to correctly handle the second scenario
above we punted the lowering of VECT_SCALAR_BOOLEAN_TYPE_P comparisons that were
already in the right shape.  e.g. e != 0 where e is a boolean does not need any
lowering.

The issue however is that the statement feeding e may need to be lowered in the
case where it's a data expression.

This patch changes a bit how we do the lowering.  We now always emit an
additional compare. e.g. if the input is;

  if (e != 0)

where is a boolean we would punt on thi before, but now we generate

  f = e != 0
  if (f != 0)

We then use the same infrastructre as recog_bool to ask it to lower f, and in
doing so handle and boolean conversions that need to be lowered.

Because we now guarantee that f is an internal def we can also simplify the
SLP building code.

When e is a boolean, the precision we build for f needs to reflect the precision
of the operation feeding e.  To get this value we use integer_type_for_mask the
same way recog_bool does, and if it's defined (e.g. we have a data conversions
somewhere) we pass that precision on instead.  This gets us the correct VF
on the newly lowered boolean expressions.

gcc/ChangeLog:

PR tree-optimization/117176
* tree-vect-patterns.cc (vect_recog_gcond_pattern): Lower all gconds.
* tree-vect-slp.cc (vect_analyze_slp): No longer check for in vect def.

gcc/testsuite/ChangeLog:

PR tree-optimization/117176
* gcc.dg/vect/vect-early-break_130-pr117176.c: New test.

OpenMP/C++: Use STRIP_REFERENCE_REF to fix declare variant with reference-returning functions

As Jakub suggested, use STRIP_REFERENCE_REF instead of doing it manually
as r15-4800-geb828a1e380e7b did.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Use STRIP_REFERENCE_REF
instead of doing it manually.

RISC-V: Do not inline when callee is versioned but caller is not

When the callee is versioned but the caller is not, we should not inline
the callee into the caller, to prevent the default version of the callee
from being inlined into a not versioned caller.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_can_inline_p): Refuse to inline
when callee is versioned but caller is not.

OpenMP/C++: Fix declare variant with reference-returning functions

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Strip indirect ref
around variant-function call when processing a variant.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-9.C: New test.

RISC-V: Split riscv_process_target_attr with const char *args argument

This patch splits static bool riscv_process_target_attr
(tree args, location_t loc) into two functions:

- bool riscv_process_target_attr (const char *args, location_t loc)
- static bool riscv_process_target_attr (tree args, location_t loc)

Thus, we can call `riscv_process_target_attr` with a `const char *`
argument. This is useful for implementation of `target_version`
attribute.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_process_target_attr): New.
* config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
Split into two functions with const char *args argument

libstdc++: Add align_alloc attribute to aligned operator new

The aligned versions of operator new should use the align_alloc
attribute to help the compiler.

PR c++/86878 requests that the compiler would use the attribute to warn
about invalid attributes, so an XFAILed test is added for that.

libstdc++-v3/ChangeLog:

* libsupc++/new (operator new): Add attribute align_alloc(2) to
overloads taking a std::align_val_t argument.
* testsuite/18_support/new_aligned_warn.cc: New test.

Reviewed-by: Jakub Jelinek <jakub@redhat.com>

expand: Fix up expansion of VIEW_CONVERT_EXPR to BITINT_TYPE [PR117354]

The following testcase ICEs, because when trying to expand the
VIEW_CONVERT_EXPR operand which is SSA_NAME defined to
V32QI or V4DI MEM_REF which is aligned just to 8 bytes we force
it as unaligned into a register, but then try to call extract_bit_field
from the V32QI or V4DI register to BLKmode.  extract_bit_field doesn't
obviously support BLKmode extraction and so ICEs.

The second hunk fixes the ICE by not calling extract_bit_field when
it can't handle it, the last if will handle it properly by storing
it to memory and using BLKmode access to the copy.

The first hunk is an optimization, if mode is BLKmode, by setting
inner_reference_p argument to expand_expr_real we avoid the
expand_misaligned_mem_ref calls which load it from memory into a register.

2024-10-31  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117354
* expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Pass
true as inner_reference_p argument to expand_expr_real if
mode is BLKmode.  Don't call extract_bit_field if mode is BLKmode.

* gcc.dg/bitint-113.c: New test.

RISC-V: allow -fno-plt to disable PLT

Currently, the RISC-V target uses the target specific mplt option to
control PLT generation. This patch deprecates the target specific mplt
option and uses the common fplt option instead. This allows users to
use the same option for most targets.

Co-Developed-by: Liao Shihua <shihua@iscas.ac.cn>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/predicates.md: Use flag_plt instead of TARGET_PLT.
* config/riscv/riscv.opt: alias common option fplt to mplt.

tree: Fix up comment wording in valid_new_delete_pair_p

I've noticed duplicated word in a comment, fixed thusly.

2024-10-31 Jakub Jelinek <jakub@redhat.com>

* tree.cc (valid_new_delete_pair_p): Fix up duplicate "or or"
in comment.

Fortran: Fix problem with substring selectors in ASSOCIATE [PR115700]

2024-10-31 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/115700
* resolve.cc (resolve_variable): The typespec of an expression,
which is not a substring, can be shared with a deferred length
associate name.
(resolve_assoc_var): Extract a substring reference with non-
constant start or end. Use it to flag up the need for array
associate name to be a pointer.
(resolve_block_construct): Change comment from past to future
tense.

gcc/testsuite/
PR fortran/115700
* gfortran.dg/associate_70.f90: New test.

testsuite: fix syntax in Wstringop-overflow-59.c

Fix quoting issues, escaping, and dg directive types.

There were two issues here:
1) The incorrect quoting in an earlier dg-message was covering up that the
syntax in the next part was wrong;
2) Fix dg-warning -> dg-message to correctly pick up the notes. Once 1) was
fixed, this was exposed.

With this, I get:
```
+PASS: gcc.dg/Wstringop-overflow-59.c note (test for warnings, line 192)
+PASS: gcc.dg/Wstringop-overflow-59.c note (test for warnings, line 193)
```

gcc/testsuite/ChangeLog:
PR middle-end/92936

* gcc.dg/Wstringop-overflow-59.c: Fix dg-* syntax.

gimple: Remove special handling of COND_EXPR for COMPARISON_CLASS_P [PR116949, PR114785]

After r13-707-g68e0063397ba82, COND_EXPR for gimple assign no longer could contain a comparison.
The vectorizer was builting gimple assigns with comparison until r15-4695-gd17e672ce82e69
(which added an assert to make sure it no longer builds it).

So let's remove the special handling COND_EXPR in a few places and add an assert to
gimple_build_assign_1 to make sure we don't build a gimple assign any more with a comparison.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR middle-end/114785
PR middle-end/116949
* gimple-match-exports.cc (maybe_push_res_to_seq): Remove special
handling of COMPARISON_CLASS_P in COND_EXPR/VEC_COND_EXPR.
(gimple_extract): Likewise.
* gimple-walk.cc (walk_stmt_load_store_addr_ops): Likewise.
* gimple.cc (gimple_build_assign_1): Add assert for COND_EXPR
so its 1st operand is not a comparison.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

libstdc++: Fix copy&paste comments in vector range tests

These comments were copied from the std::vector<bool> tests, but the
value_type is not bool in these ones.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/cons/from_range.cc: Fix copy &
paste error in comment.
* testsuite/23_containers/vector/modifiers/append_range.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/assign_range.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/insert/insert_range.cc:
Likewise.

libstdc++: Fix some typos and grammatical errors in docs

Also remove some redundant 'void' parameters from code examples.

libstdc++-v3/ChangeLog:

* doc/xml/manual/using_exceptions.xml: Fix typos and grammatical
errors.
* doc/html/manual/using_exceptions.html: Regenerate.

[PATCH] Fix SLP when ifcvt versioned loop is not vectorized

When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If the
vector loop is not vectorized and removed, the scalar loop is still left with
dont_vectorize. As a result, BB vectorization will not happen.

This patch resets dont_vectorize to scalar loop when IFN_LOOP_VECTORIZED
is set to false.

gcc/ChangeLog:

* tree-vectorizer.cc (pass_vectorize::execute): Reset dont_vectorize
to scalar loop when setting IFN_LOOP_VECTORIZED to false.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-77.c: New test.

[PATCH] Adjust param_vect_max_version_for_alias_checks

This patch sets param_vect_max_version_for_alias_checks to 15.
This was causing GCC to miss vectorization opportunities for an application,
making it slower than LLVM by about ~14%.

Original default of 10 itself is arbitary. Given that, GCC's vectoriser does
consideres cost of alias checks, increasing this param is reasonable.

In this case we need a value of at teast 11 whereas the current
default is 10.

gcc/ChangeLog:

* params.opt: Adjust param_vect_max_version_for_alias_checks

gcc/testsuite/ChangeLog:

* g++.dg/alias-checks.C: New test.

Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>

c: Do not document C23 support as experimental and incomplete

Since C23 support is substantially feature-complete, update
documentation to no longer refer to it as experimental and incomplete.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
* doc/cpp.texi (__STDC_VERSION__): Do not refer to C23 support as
experimental.
* doc/invoke.texi (std=c23, std=gnu23): Do not document as
experimental and incomplete.
* doc/standards.texi: Do not refer to C23 support as experimental
and incomplete.

gcc/c-family/
* c.opt (std=c23, std=gnu23, std=iso9899:2024): Do not mark as
experimental and incomplete.

syscall: don't define syscall stub on Hurd

Patch from Samuel Thibault.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/623415

Remove sys/user time in -ftime-report

Retrieving sys/user time in timevars is quite expensive because it
always needs a system call. Only getting the wall time is much
cheaper because operating systems have optimized paths for this.

The sys time isn't that interesting for a compiler and wall time
is usually close to user time except when the system is overloaded.
On the other hand when it is not wall time is more accurate because
it has less overhead.

For building tramp3d with -O0 the -ftime-report overhead drops from
18% to 3%. For -O2 it drops from 8% to not measurable.

I changed the code to use gettimeofday as a fallback for clock_gettime
CLOCK_MONOTONIC. If a host has neither of those the time will not
be measured. Previously clock was the fallback.

This removes a lot of code in timevar.cc:

gcc/timevar.cc | 167 ++++++---------------------------------------------------
gcc/timevar.h | 10 +---

2 files changed, 17 insertions(+), 160 deletions(-)

gcc/ChangeLog:

* timevar.cc (struct tms): Remove.
(RUSAGE_SELF): Remove.
(TICKS_PER_SECOND): Remove.
(USE_TIMES): Remove.
(HAVE_USER_TIME): Remove.
(HAVE_SYS_TIME): Remove.
(HAVE_WALL_TIME): Remove.
(USE_GETRUSAGE): Remove.
(USE_CLOCK): Remove.
(NANOSEC_PER_SEC): Remove.
(TICKS_TO_NANOSEC): Remove.
(CLOCKS_TO_NANOSEC): Remove.
(timer::named_items::push): Remove sys/user.
(get_time): Remove clock and times and getruage code.
(timevar_accumulate): Remove sys/user.
(timevar_diff): Dito.
(timer::validate_phases): Dito.
(timer::print_row): Dito.
(timer::all_zero): Dito.
(timer::print): Dito.
(make_json_for_timevar_time_def): Dito.
* timevar.h (struct timevar_time_def): Dito.

Remove vectorizer finish_cost wrapper

The inline function wraps the vector_cost class API and no longer is
a good representation of the query style of that class which makes it
also difficult to extend.

* tree-vectorizer.h (finish_cost): Inline everywhere and remove.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Inline finish_cost.
* tree-vect-slp.cc (vect_bb_vectorization_profitable_p): Likewise.

Fix function multiversioning dispatcher link error with LTO

We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
building with LTO, the linker cannot find the
__init_cpu_features_resolver.lto_priv* symbol, causing the link error.

This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
but never defined warning for this symbol, we also mark TREE_PUBLIC to the decl.
We should also mark the decl having hidden visibility. And fix the attribute in
the same way for __aarch64_cpu_features identifier.

Minimal steps to reproduce the bug:

echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
echo 'void func1();void func2();int main(){func1();func2();return 0;}' > main.c
gcc -flto -c 1.c 2.c
gcc -flto main.c 1.o 2.o

Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/aarch64/aarch64.cc (dispatch_function_versions): Adding
DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
__init_cpu_features_resolver and __aarch64_cpu_features.

c: Diagnose char argument to __builtin_stdc_*

When working on __builtin_stdc_rotate_*, I've noticed that while the
second argument to those is explicitly allowed to have char type,
the first argument to all the stdc_* type-generic functions is
- standard unsigned integer type, excluding bool;
- extended unsigned integer type;
- or, bit-precise unsigned integer type whose width matches a standard
or extended integer type, excluding bool.
but the __builtin_stdc_* lowering code was diagnosing just
!INTEGRAL_TYPE_P
ENUMERAL_TYPE
BOOLEAN_TYPE
!TYPE_UNSIGNED
Now, with -funsigned-char plain char type is TYPE_UNSIGNED, yet it isn't
allowed because it isn't standard unsigned integer type, nor
extended unsigned integer type, nor bit-precise unsigned integer type.

The following patch diagnoses char arguments and adds testsuite coverage
for that.

Or should I make it a pedwarn instead?

2024-10-30 Jakub Jelinek <jakub@redhat.com>

gcc/c/
* c-parser.cc (c_parser_postfix_expression): Diagnose if
first __builtin_stdc_* argument has char type even when
-funsigned-char.
gcc/testsuite/
* gcc.dg/builtin-stdc-bit-3.c: New test.
* gcc.dg/builtin-stdc-rotate-3.c: New test.

[RISC-V] Aggressively hoist VXRM assignments

So a while back I was looking at pixel_avg for RISC-V where we try to
use vaaddu for the halfword-ceiling-average step.  The problem with
vaaddu is that you must set VXRM to a suitable rounding mode as it has
an undetermined state at function entry or after a function call.

It turns out some designs will fully flush their pipelines on a write to
VXRM which you can imagine is incredibly expensive.

VXRM assignments are handled by an LCM based algorithm to find "optimal"
placement points based on what insns in the stream need VXRM assignments
and the particular mode they need.

Unfortunately in pixel_avg an LCM algorithm only allows hoisting out of
the innermost loop, but not the outer loop.  The core issue is that LCM
does not allow any speculation and there are paths which would bypass
the inner loop (which don't actually trigger at runtime IIRC).

The expectation is that VXRM assignments should be exceedingly rare and
needing more than one mode even rarer.  So hoisting more aggressively
seems like a reasonable thing to do, but we don't want to burn too much
time trying to do something fancy.

So what this patch does is scan the IL once collecting any VXRM needs.
If the current function has precisely one VXRM mode needed, then we
pretend (for the sake of LCM) that the first instruction in the function
also has that need.

By doing so the VXRM assignment is essentially anticipated everywhere in
the function.  The standard LCM algorithm is run and has enough
information to hoist the VXRM assignment more aggressively, most often
to the prologue.

This helps the BPI in a measurable way (IIRC it was 2-3%).  It probably
helps some of the SiFive designs, but I've been told they still benefit
from the longer sequence of shifts & adds, hoisting just isn't enough
for those designs.  The Ventana design basically doesn't care where the
VXRM assignment is.  Point is we may want to have a tuning knob for the
patterns which need VXRM (vaadd[u], vasub[u]) at some point in the near
future.

Bootstrapped and regression tested on riscv64 and regression tested on
riscv32-elf and riscv64-elf.  We've been using this internally for a
while a while on spec as well.   Obviously I'll wait for the pre-commit
tester to do its thing.

gcc/
* config/riscv/riscv.cc (singleton_vxrm_need): New function.
(riscv_mode_needed): See if there is a singleton need and if so,
claim it happens on the first insn in the chain.

c++, contracts: Only check contracts attributes [PR116607].

The ICE described in the PR is caused by not filtering out non-
contract attributes before making the has_active_contract_condition
test. Fixed, as suggested by Andrew Pinski, by just using the
existing CONTRACT_CHAIN () macro to advance through the list.

PR c++/116607

gcc/cp/ChangeLog:

* contracts.cc (has_active_contract_condition): Use the
CONTRACT_CHAIN macro to advance through the attribute list.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr116607.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

libstdc++: Define config macros for additional IEEE formats

Some targets use IEEE binary64 for both double and long double, which
means we could use memmove to optimize a std::copy from a range of
double to a range of long double. We currently have no config macro to
detect when long double is binary64, so add that to <bits/c++config.h>.

This also adds config macros for the case where double and long double
both use the same binary32 format as float, which is true for the avr
target. No specializations of __memcpyable for that case are added by
this patch, but they could be added later.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DOUBLE_IS_IEEE_BINARY32):
Define.
(_GLIBCXX_LDOUBLE_IS_IEEE_BINARY64): Define.
(_GLIBCXX_LDOUBLE_IS_IEEE_BINARY32): Define.
* include/bits/cpp_type_traits.h (__memcpyable): Define
specializations when double and long double are compatible.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Define __memcpyable<float*, _Float32*> as true

This allows optimizing copying ranges of floating-point types when they
have the same size and representation, e.g. between _Float32 and float
when we know that float uses the same IEEE binary32 format as _Float32.

On some targets double and long double both use IEEE binary64 format so
we could enable memcpy between those types, but we don't have existing
macros to check for that case.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__memcpyable): Add
specializations for compatible floating-point types.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

Fix ICE due to subreg:us_truncate.

Force_operand issues an ICE when input
is (subreg:DI (us_truncate:V8QI)), it's probably because it's an
invalid rtx, So refine backend patterns for that.

gcc/ChangeLog:

PR target/117318
* config/i386/sse.md (*avx512vl_<code>v2div2qi2_mask_store_1):
Rename to ..
(avx512vl_<code>v2div2qi2_mask_store_1): .. this.
(avx512vl_<code>v2div2qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v8qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v8qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v8qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4hi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4hi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2hi2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2hi2_mask_store_1): .. this.
(avx512vl_<code>v2div2hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2si2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2si2_mask_store_1): .. this.
(avx512vl_<code>v2div2si2_mask_store_2): Change to
define_expand.
(*avx512f_<code>v8div16qi2_mask_store_1): Rename to ..
(avx512f_<code>v8div16qi2_mask_store_1): .. this.
(avx512f_<code>v8div16qi2_mask_store_2): Change to
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117318.c: New test.

Fortran: fix several front-end memleaks

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_trans_class_init_assign): Free intermediate
gfc_expr's.
* trans.cc (get_final_proc_ref): Likewise.
(get_elem_size): Likewise.
(gfc_add_finalizer_call): Likewise.

arm: [MVE intrinsics] Remove unused builtins qualifiers

After the re-implementation of MVE vld/vst intrinsics, a few builtins
qualifiers became useless.

This patch removes them to restore bootstrap (otherwise the build
fails because of 'defined but not used' errors.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (STRS_QUALIFIERS): Delete.
(STRU_QUALIFIERS): Delete.
(STRS_P_QUALIFIERS): Delete.
(STRU_P_QUALIFIERS): Delete.
(LDRS_QUALIFIERS): Delete.
(LDRU_QUALIFIERS): Delete.
(LDRS_Z_QUALIFIERS): Delete.
(LDRU_Z_QUALIFIERS): Delete.

Remove dead part of bool pattern recognition

Given we no longer want vcond[u]{,_eq} and VEC_COND_EXPR or COND_EXPR
with embedded GENERIC comparisons the whole check_bool_pattern
and adjust_bool_stmts machinery is dead. It is effectively dead
after r15-4713-g0942bb85fc5573 and the following patch removes it.

* tree-vect-patterns.cc (check_bool_pattern): Remove.
(adjust_bool_pattern_cast): Likewise.
(adjust_bool_pattern): Likewise.
(sort_after_uid): Likewise.
(adjust_bool_stmts): Likewise.
(vect_recog_bool_pattern): Remove calls to check_bool_pattern
and fold as if it returns false.

[MAINTAINERS] Add myself to write after approval and DCO.

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO.

function: Call do_pending_stack_adjust in assign_parms [PR117296]

Functions called by assign_parms call emit_block_move in two places,
so on some targets can be expanded as calls and can result in pending
stack adjustment.

Now, during expansion we normally call do_pending_stack_adjust at the end
of expansion of each basic block or before emitting code that will branch
and/or has labels, and when emitting labels we assert that there are no
pending stack adjustments.

assign_parms is expanded before the first basic block and if the first
basic block starts with a label and at least one of those emit_block_move
calls resulted in the need of pending stack adjustments, we ICE when
emitting that label.

The following patch fixes that by calling do_pending_stack_adjust after
after the assign_parms potential emit_block_move calls.

2024-10-30 Jakub Jelinek <jakub@redhat.com>

PR target/117296
* function.cc (assign_parms): Call do_pending_stack_adjust.

* gcc.target/i386/pr117296.c: New test.

genmatch: Fix build on hppa64-hpux [PR117348]

Apparently autoconf defines the HAVE_DECL_* macros to 0
rather than not defining them at all, so defined(HAVE_DECL_FMEMOPEN)
test doesn't do much.

The following patch fixes it by testing HAVE_DECL_FMEMOPEN
for being non-zero instead.

2024-10-30 Jakub Jelinek <jakub@redhat.com>

PR middle-end/117348
* genmatch.cc: Replace defined(HAVE_DECL_FMEMOPEN)
test with HAVE_DECL_FMEMOPEN.

Fortran: Move pr115070.f90 to ieee directory [PR117335].

2024-10-30 Paul Thomas <pault@gcc.gnu.org>

gcc/testsuite/
PR fortran/117335
* gfortran.dg/pr115070.f90: Delete.
* gfortran.dg/ieee/pr115070.f90: Moved to ieee directory to
prevent failures on incompatible architectures.

i386: Use assign_stack_temp instead of assign_386_stack_local with SLOT_TEMP

It is better to use assign_stack_temp instead of assign_386_stack_local
with SLOT_TEMP because assign_stack_temp also shares sub-space of stack
slots (e.g. HImode temp shares stack slot with SImode stack slot).

Use assign_386_stack_local only for special stack slots (SLOT_STV_TEMP that
can be nested inside other stack temp access, SLOT_FLOATxFDI_387 that has
relaxed alignment constraint) or slots that can't be shared (SLOT_CW_*).

The patch removes SLOT_TEMP. assign_stack_temp should be used instead.

gcc/ChangeLog:

* config/i386/i386.h (enum ix86_stack_slot): Remove SLOT_TEMP.
* config/i386/i386-expand.cc (ix86_expand_builtin)
<case IX86_BUILTIN_LDMXCSR>: Use assign_stack_temp instead of
assign_386_stack_local with SLOT_TEMP.
<case IX86_BUILTIN_LDMXCSR>: Ditto.
(ix86_expand_divmod_libfunc): Ditto.
* config/i386/i386.md (floatunssi<mode>2): Ditto.
* config/i386/sync.md (atomic_load<mode>): Ditto.
(atomic_store<mode>): Ditto.

c: Add C2Y N3370 - Case range expressions support [PR117021]

The following patch adds the C2Y N3370 paper support.
We had the case ranges as a GNU extension for decades, so this patch
simply:
1) adds different diagnostics when it is used in C (depending on flag_isoc2y
   and pedantic and warn_c23_c2y_compat)
2) emits a pedwarn in C if in a range conversion changes the value of
   the low or high bounds and in that case doesn't emit -Woverflow and
   similar warnings anymore if the pedwarn has been diagnosed
3) changes the handling of empty ranges both in C and C++; previously
   we just warned but let the values be still looked up in the splay
   tree/entered into it (and let only gimplification throw away those
   empty cases), so e.g. case -6 ... -8: break; case -6: break;
   complained about duplicate case label.  But that actually isn't
   duplicate case label, case -6 ... -8: stands for nothing at all
   and that is how it is treated later on (thrown away)

2024-10-30  Jakub Jelinek  <jakub@redhat.com>

PR c/117021
gcc/c-family/
* c-common.cc (c_add_case_label): Emit different diagnostics for C
on case ranges.  Diagnose for C using pedwarn conversions of range
expressions changing value and don't emit further conversion
diagnostics if the pedwarn has been diagnosed.  For empty ranges
bail out after emitting warning, don't add anything into splay
trees nor add a CASE_LABEL_EXPR.
gcc/testsuite/
* gcc.dg/switch-6.c: Expect different diagnostics.  Add -std=gnu23
to dg-options.
* gcc.dg/switch-7.c: Expect different diagnostics.  Add -std=c23
to dg-options.
* gcc.dg/gnu23-switch-1.c: New test.
* gcc.dg/gnu23-switch-2.c: New test.
* gcc.dg/c23-switch-1.c: New test.
* gcc.dg/c2y-switch-1.c: New test.
* gcc.dg/c2y-switch-2.c: New test.
* gcc.dg/c2y-switch-3.c: New test.

testsuite: Adjust AVX10.2 check_effective_target

Since Binutils haven't fully merged all AVX10.2 insts, only testing
one inst/intrin in AVX10.2 is never sufficient for check_effective_target.
Like APX_F, use inline asm to do the target check.

gcc/testsuite/ChangeLog:

PR target/117301
* lib/target-supports.exp (check_effective_target_avx10_2):
Use inline asm instead of intrin for check_effective_target.
(check_effective_target_avx10_2_512): Ditto.

RISC-V: Add testcases for unsigned .SAT_SUB form 2 with IMM = 1.

form2:
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{                                       \
  return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_sub_imm-run-5.c: add run case for imm=1.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8_1.c: New test.

Match: Simplify (x != 0 ? x + ~0 : 0) to (x - x != 0).

When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
we can simplify (x != 0 ? x + ~0 : 0) to (x - x != 0), thereby eliminating
a branch instruction.This simplification also applies to signed integer.

Form2:
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{                                       \
  return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Take below form 2 as example:
DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  uint8_t _1;
  uint8_t _3;

  <bb 2> [local count: 1073741824]:
  if (x_2(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  _3 = x_2(D) + 255;

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <x_2(D)(2), _3(3)>
  return _1;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
beq a0,zero,.L2
addiw a0,a0,-1
andi a0,a0,0xff
.L2:
ret

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  _Bool _1;
  unsigned char _2;
  uint8_t _4;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) != 0;
  _2 = (unsigned char) _1;
  _4 = x_3(D) - _2;
  return _4;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
snez a5,a0
subw a0,a0,a5
andi a0,a0,0xff
ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Simplify (x != 0 ? x + ~0 : 0) to (x - x != 0).

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-44.c: New test.
* gcc.dg/tree-ssa/phi-opt-45.c: New test.

Daily bump.

Revert "Simplify switch bit test clustering algorithm"

This reverts commit 3d06e9c3e07e13eab715e19dafbcfc1a0b7e43d6.

diagnostics: support multiple output formats simultaneously [PR116613]

This patch generalizes diagnostic_context so that rather than having
a single output format, it has a vector of zero or more.

It adds new two options:
-fdiagnostics-add-output=DIAGNOSTICS-OUTPUT-SPEC
-fdiagnostics-set-output=DIAGNOSTICS-OUTPUT-SPEC
which both take a new configuration syntax of the form SCHEME ("text" or
"sarif"), optionally followed by ":" and one or more KEY=VALUE pairs,
in this form:

  <SCHEME>
  <SCHEME>:<KEY>=<VALUE>
  <SCHEME>:<KEY>=<VALUE>,<KEY2>=<VALUE2>
  ...etc

where each SCHEME supports some set of keys.  For example, it's now
possible to use:

  -fdiagnostics-add-output=sarif:version=2.1,file=foo.2.1.sarif \
  -fdiagnostics-add-output=sarif:version=2.2-prerelease,file=foo.2.2.sarif

to add a pair of outputs, each writing to a different file, using
versions 2.1 and 2.2 of the SARIF standard respectively, whilst also
emitting the classic text form of the diagnostics to stderr.

I hope the new syntax gives us room to potentially add new kinds of
output sink in the future (e.g. RPC notifications), and to add new
key/value pairs as needed by the different sinks.

Implementation-wise, the diagnostic_context's m_printer which previously
was used directly by the single output format now becomes a "reference
printer", created by the client (such as the frontend), with defaults
modified by command-line options.  Each of the multiple output sinks has
its own pretty_printer instance, created by cloning the context's
reference printer.

gcc/ChangeLog:
PR other/116613
* Makefile.in (OBJS-libcommon-target): Add opts-diagnostic.o.
* common.opt (fdiagnostics-add-output=): New.
(fdiagnostics-set-output=): New.
(diagnostics_output_format): Drop sarif-file-2.2-prerelease from
enum.
* common.opt.urls: Regenerate.
* diagnostic-buffer.h (diagnostic_buffer::~diagnostic_buffer): New.
(diagnostic_buffer::ensure_per_format_buffer): Rename to...
(diagnostic_buffer::ensure_per_format_buffers): ...this.
(diagnostic_buffer::m_per_format_buffer): Replace with...
(diagnostic_buffer::m_per_format_buffers): ...this, updating type.
* diagnostic-format-json.cc (json_output_format::update_printer):
New.
(json_output_format::follows_reference_printer_p): New.
(diagnostic_output_format_init_json): Drop redundant call to
set_path_format, as this is not a text output format.
* diagnostic-format-sarif.cc: Include "diagnostic-format-text.h".
(sarif_builder::set_printer): New.
(sarif_builder::sarif_builder): Add "printer" param and use it for
m_printer.
(sarif_builder::make_location_object::escape_nonascii_renderer::render):
Rather than using dc.m_printer, create a
diagnostic_text_output_format instance and use its printer.
(sarif_output_format::follows_reference_printer_p): New.
(sarif_output_format::update_printer): New.
(sarif_output_format::sarif_output_format): Pass in correct
printer to m_builder's ctor.
(diagnostic_output_format_init_sarif): Drop redundant call to
set_path_format, as this is not a text output format.  Replace
calls to pp_show_color and set_token_printer with call to
update_printer.  Drop redundant call to set_show_highlight_colors,
as this printer does not show colors.
(diagnostic_output_format_init_sarif_file): Split out file opening
into...
(diagnostic_output_format_open_sarif_file): ...this new function.
(make_sarif_sink): New.
(selftest::test_make_location_object): Provide a pp for the
builder.
* diagnostic-format-sarif.h
(diagnostic_output_format_open_sarif_file): New decl.
(make_sarif_sink): New decl.
* diagnostic-format-text.cc (diagnostic_text_output_format::dump):
Dump sm_follows_reference_printer.
(diagnostic_text_output_format::on_report_verbatim): New.
(diagnostic_text_output_format::follows_reference_printer_p): New.
(diagnostic_text_output_format::update_printer): New.
* diagnostic-format-text.h
(diagnostic_text_output_format::diagnostic_text_output_format):
Add optional "follows_reference_printer" param.
(diagnostic_text_output_format::on_report_verbatim): New decl.
(diagnostic_text_output_format::after_diagnostic): Drop "final".
(diagnostic_text_output_format::follows_reference_printer_p): New
decl.
(class diagnostic_text_output_format): Convert private members to
protected.
(diagnostic_text_output_format::m_follows_reference_printer): New
field.
* diagnostic-format.h
(diagnostic_output_format::on_report_verbatim): New vfunc.
(diagnostic_output_format::follows_reference_printer_p): New vfunc.
(diagnostic_output_format::update_printer): New vfunc.
(diagnostic_output_format::get_printer): Use m_printer rather than
a printer from m_context.
(diagnostic_output_format::diagnostic_output_format): Initialize
m_printer by cloning the context's printer.
(diagnostic_output_format::m_printer): New field.
* diagnostic-global-context.cc (verbatim): Reimplement in terms of
global_dc->report_verbatim, moving existing implementation to
diagnostic_text_output_format::on_report_verbatim.
(fnotice): Support multiple output sinks by using a new
global_dc->supports_fnotice_on_stderr_p.
* diagnostic-output-file.h
(diagnostic_output_file::diagnostic_output_file): New default ctor.
(diagnostic_output_file::operator=): Implement move assignment.
* diagnostic-path.cc (selftest::test_interprocedural_path_1): Pass
false for new param of text_output's ctor.
* diagnostic-show-locus.cc
(selftest::test_layout_x_offset_display_utf8): Use reference
printer.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_one_liner_fixit_remove): Likewise.
* diagnostic.cc: Include "pretty-print-urlifier.h".
(diagnostic_set_caret_max_width): Update for global_dc's m_printer
becoming reference printer.
(diagnostic_context::initialize): Update for m_printer becoming
m_reference_printer.  Use ::make_unique to create it.  Update for
m_output_format becoming m_output_sinks.
(diagnostic_context::color_init): Update the reference printer,
then update the printers for any output sinks that follow it.
(diagnostic_context::urls_init): Likewise.
(diagnostic_context::finish): Update comment.  Update for
m_output_format becoming m_output_sinks.  Update for m_printer
becoming m_reference_printer and use "delete" on it rather than
XDELETE.
(diagnostic_context::dump): Update for m_printer becoming
reference printer, and for multiple output sinks.
(diagnostic_context::set_output_format): Reimplement for
supporting multiple output sinks.
(diagnostic_context::get_output_format): Likewise.
(diagnostic_context::add_sink): New.
(diagnostic_context::supports_fnotice_on_stderr_p): New.
(diagnostic_context::set_pretty_printer): New.
(diagnostic_context::refresh_output_sinks): New.
(diagnostic_context::set_format_decoder): New.
(diagnostic_context::set_show_highlight_colors): New.
(diagnostic_context::set_prefixing_rule): New.
(diagnostic_context::report_diagnostic): Update to support
multiple output sinks.
(diagnostic_context::report_verbatim): New.
(diagnostic_context::emit_diagram): Update to support multiple
output sinks.
(diagnostic_context::error_recursion): Update to use
m_reference_printer.
(fancy_abort): Likewise.
(diagnostic_context::end_group): Update to support multiple
output sinks.
(diagnostic_output_format::dump): Implement.
(diagnostic_output_format::on_report_verbatim): Likewise.
(diagnostic_output_format_init): Drop
DIAGNOSTICS_OUTPUT_FORMAT_SARIF_FILE_2_2_PRERELEASE.
(diagnostic_context::set_diagnostic_buffer): Reimplement to
support multiple output sinks.
(diagnostic_context::clear_diagnostic_buffer): Likewise.
(diagnostic_context::flush_diagnostic_buffer): Likewise.
(diagnostic_buffer::diagnostic_buffer): Initialize
m_per_format_buffers.
(diagnostic_buffer::~diagnostic_buffer): New dtor.
(diagnostic_buffer::dump): Reimplement to support multiple output
sinks.
(diagnostic_buffer::empty_p): Likewise.
(diagnostic_buffer::move_to): Likewise.
(diagnostic_buffer::ensure_per_format_buffer): Likewise, renaming
to...
(diagnostic_buffer::ensure_per_format_buffers): ...this.
* diagnostic.h
(DIAGNOSTICS_OUTPUT_FORMAT_SARIF_FILE_2_2_PRERELEASE): Delete.
(class diagnostic_context): Add friend class diagnostic_buffer.
(diagnostic_context::set_pretty_printer): New decl.
(diagnostic_context::refresh_output_sinks): New decl.
(diagnostic_context::report_verbatim): New decl.
(diagnostic_context::get_output_format): Drop.
(diagnostic_context::set_show_highlight_colors): Drop body.
(diagnostic_context::set_format_decoder): New decl.
(diagnostic_context::set_prefixing_rule): New decl.
(diagnostic_context::clone_printer): Reimplement.
(diagnostic_context::get_reference_printer): New accessor.
(diagnostic_context::add_sink): New decl.
(diagnostic_context::supports_fnotice_on_stderr_p): New decl.
(diagnostic_context::m_printer): Replace with...
(diagnostic_context::m_reference_printer): ...this, and make
private.
(diagnostic_context::m_output_format): Replace with...
(diagnostic_context::m_output_sinks): ...this.
(diagnostic_format_decoder): Delete.
(diagnostic_prefixing_rule): Delete.
(diagnostic_ready_p): Delete.
* doc/invoke.texi: Document -fdiagnostics-add-output= and
-fdiagnostics-set-output=.
* gcc.cc: Include "opts-diagnostic.h".
(driver_handle_option): Handle cases OPT_fdiagnostics_add_output_
and OPT_fdiagnostics_set_output_.
* opts-diagnostic.cc: New file.
* opts-diagnostic.h (handle_OPT_fdiagnostics_add_output_): New decl.
(handle_OPT_fdiagnostics_set_output_): New decl.
* opts-global.cc (init_options_once): Update for global_dc's
m_printer becoming reference printer.  Call
global_dc->refresh_output_sinks.
* opts.cc (common_handle_option): Replace use of
diagnostic_prefixing_rule with dc->set_prefixing_rule.  Handle
cases OPT_fdiagnostics_add_output_ and
OPT_fdiagnostics_set_output_.  Update for m_printer becoming
reference printer.
* selftest-diagnostic.cc
(selftest::test_diagnostic_context::test_diagnostic_context):
Update for m_printer becoming reference printer.
(test_diagnostic_context::test_show_locus): Likewise.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::opts_diagnostic_cc_tests.
* selftest.h (selftest::opts_diagnostic_cc_tests): New decl.
* simple-diagnostic-path.cc
(selftest::simple_diagnostic_path_cc_tests): Use reference
printer.
* toplev.cc (announce_function): Update for global_dc's m_printer
becoming reference printer.
(toplev::main): Likewise.
* tree-diagnostic.cc (tree_diagnostics_defaults): Replace use of
diagnostic_format_decoder with context->set_format_decoder.
* tree-diagnostic.h
(tree_dump_pretty_printer::tree_dump_pretty_printer): Update for
global_dc's m_printer becoming reference printer.
* tree.cc (escaped_string::escape): Likewise.
(selftest::test_escaped_strings): Likewise.

gcc/ada/ChangeLog:
PR other/116613
* gcc-interface/misc.cc (internal_error_function): Update for
m_printer becoming reference printer.

gcc/analyzer/ChangeLog:
PR other/116613
* analyzer-language.cc (on_finish_translation_unit): Update for
m_printer becoming reference printer.
* engine.cc (run_checkers): Likewise.
* program-point.cc (function_point::print_source_line): Likewise.

gcc/c-family/ChangeLog:
PR other/116613
* c-format.cc (selftest::test_type_mismatch_range_labels): Update
for m_printer becoming reference printer.
(selftest::test_type_mismatch_range_labels): Likewise.

gcc/c/ChangeLog:
PR other/116613
* c-objc-common.cc: Include "make-unique.h".
(c_initialize_diagnostics): Use unique_ptr for pretty_printer.
Use context->set_format_decoder.

gcc/cp/ChangeLog:
PR other/116613
* error.cc (cxx_initialize_diagnostics): Use unique_ptr for
pretty_printer.  Use context->set_format_decoder.
* module.cc (noisy_p): Update for global_dc's m_printer becoming
reference printer.

gcc/d/ChangeLog:
PR other/116613
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Update for
m_printer becoming reference printer.

gcc/fortran/ChangeLog:
PR other/116613
* error.cc (gfc_diagnostic_build_kind_prefix): Update for
global_dc's m_printer becoming reference printer.
(gfc_diagnostics_init): Replace usage of diagnostic_format_decoder
with global_dc->set_format_decoder.

gcc/jit/ChangeLog:
PR other/116613
* dummy-frontend.cc: Include "make-unique.h".
(class jit_diagnostic_listener): New.
(jit_begin_diagnostic): Update comment.
(jit_end_diagnostic): Drop call to add_diagnostic.
(jit_langhook_init): Set the output format to a new
jit_diagnostic_listener.
* jit-playback.cc (playback::context::add_diagnostic): Add "text"
param and use that rather than trying to get the text from a
pretty_printer.
* jit-playback.h (playback::context::add_diagnostic): Add "text"
param.

gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/analyzer_cpython_plugin.c (dump_refcnt_info):
Update for global_dc's m_printer becoming reference printer.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2.2.c: Replace usage
of -fdiagnostics-format=sarif-file-2.2-prerelease with
-fdiagnostics-set-output=sarif:version=2.2-prerelease.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c: Update for
global_dc's m_printer becoming reference printer.
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Update for
changes to output formats.
* gcc.dg/plugin/expensive_selftests_plugin.c: Update for
global_dc's m_printer becoming reference printer.
* gcc.dg/sarif-output/add-output-sarif-defaults.c: New test.
* gcc.dg/sarif-output/bad-binary-op.c: New test.
* gcc.dg/sarif-output/bad-binary-op.py: New support script.
* gcc.dg/sarif-output/multiple-outputs.c: New test.
* gcc.dg/sarif-output/multiple-outputs.py: New support script.
* lib/scansarif.exp (verify-sarif-file): Add an optional second
argument specifying the expected filename of the .sarif file.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

aarch64: Use canonicalize_comparison in ccmp expansion [PR117346]

While testing the patch for PR 85605 on aarch64, it was noticed that
imm_choice_comparison.c test failed. This was because canonicalize_comparison
was not being called in the ccmp case. This can be noticed without the patch
for PR 85605 as evidence of the new testcase.

Bootstrapped and tested on aarch64-linux-gnu.

PR target/117346

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_gen_ccmp_first): Call
canonicalize_comparison before figuring out the cmp_mode/cc_mode.
(aarch64_gen_ccmp_next): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/imm_choice_comparison-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Simplify switch bit test clustering algorithm

The current switch bit test clustering enumerates all possible case
clusters combinations to find ones that fit the bit test constrains
best. This causes performance problems with very large switches.

For bit test clustering which happens naturally in word sized chunks
I don't think such an expensive algorithm is really needed.

This patch implements a simple greedy algorithm that walks
the sorted list and examines word sized windows and tries
to cluster them.

Surprisingly the new algorithm gives consistly better clusters
for the examples I tried.

For example from the gcc bootstrap:

old: 0-15 16-31 96-175
new: 0-31 96-175

I'm not fully sure why that is, probably some bug in the old
algorithm? This shows even up in the test suite where if-to-switch-6
now can generate a switch, as well as a case in switch-1.c

I don't have a proof that the new algorithm is always as good or better,
but so far at least I don't see any counter examples.

It also fixes the excessive compile time in PR117091,
however this was already fixed by an earlier patch
that doesn't run clustering when no targets have multiple
values.

gcc/ChangeLog:

PR middle-end/117091
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Change clustering algorithm to simple greedy.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/if-to-switch-6.c: Allow condition chain.
* gcc.dg/tree-ssa/switch-1.c: Allow more bit tests.
* gcc.dg/pr21643.c: Use -fno-bit-tests
* gcc.target/aarch64/pr99988.c: Use -fno-bit-tests

Only do switch bit test clustering when multiple labels point to same bb

The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.

This fixes the switch part of PR117091 where all case labels are unique
however it doesn't address the performance problems for non unique
cases.

gcc/ChangeLog:

PR middle-end/117091
* gimple-if-to-switch.cc (if_chain::is_beneficial): Update
find_bit_test call.
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Get max_c argument and bail out early if all case labels are
unique.
(switch_decision_tree::compute_cases_per_edge): Record number of
targets per label and return.
(switch_decision_tree::analyze_switch_statement): ... pass to
find_bit_tests.
* tree-switch-conversion.h: Update prototypes.

Disable -fbit-tests and -fjump-tables at -O0

gcc/ChangeLog:

* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* opts.cc (default_options_table): Dito.

Fix miscompilation of function containing __builtin_unreachable

This is a wrong-code generation on the SPARC for a function containing
a call to __builtin_unreachable caused by the delay slot scheduling pass,
and more specifically the find_end_label function which has these lines:

  /* Otherwise, see if there is a label at the end of the function. If there
     is, it must be that RETURN insns aren't needed, so that is our return
     label and we don't have to do anything else.  */

The comment was correct 20 years ago but no longer is nowadays in the
presence of RTL epilogues and calls to __builtin_unreachable, so the
patch just removes the associated two lines of code:

  else if (LABEL_P (insn))
    *plabel = as_a <rtx_code_label *> (insn);

and otherwise contains just adjustments to the commentary.

gcc/
PR rtl-optimization/117327
* reorg.cc (find_end_label): Do not return a dangling label at the
end of the function and adjust commentary.

gcc/testsuite/
* gcc.c-torture/execute/20241029-1.c: New test.

aarch64: Remove unnecessary casts to rtx_code [PR117349]

In aarch64_gen_ccmp_first/aarch64_gen_ccmp_next, the casts
were no longer needed after r14-3412-gbf64392d66f291 which
changed the type of the arguments to rtx_code.

In aarch64_rtx_costs, they were no longer needed since
r12-4828-g1d5c43db79b7ea which changed the type of code
to rtx_code.

Pushed as obvious after a build/test for aarch64-linux-gnu.

gcc/ChangeLog:

PR target/117349
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Remove
unnecessary casts to rtx_code.
(aarch64_gen_ccmp_first): Likewise.
(aarch64_gen_ccmp_next): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c-family: Handle RAW_DATA_CST in complete_array_type [PR117313]

The following testcase ICEs, because
add_flexible_array_elts_to_size -> complete_array_type
is done only after braced_lists_to_strings which optimizes
RAW_DATA_CST surrounded by INTEGER_CST into a larger RAW_DATA_CST
covering even the boundaries, while I thought it is done before
that.
So, RAW_DATA_CST now can be the last constructor_elt in a CONSTRUCTOR
and so we need the function to take it into account (handle it as
RAW_DATA_CST standing for RAW_DATA_LENGTH consecutive elements).

The function wants to support both CONSTRUCTORs without indexes and with
them (for non-RAW_DATA_CST elts it was just adding 1 for the current
index).  So, if the RAW_DATA_CST elt has ce->index, we need to add
RAW_DATA_LENGTH (ce->value) - 1, while if it doesn't (and it isn't cnt == 0
case where curindex is 0), add that plus 1, i.e. RAW_DATA_LENGTH (ce->value).

2024-10-29  Jakub Jelinek  <jakub@redhat.com>

PR c/117313
gcc/c-family/
* c-common.cc (complete_array_type): For RAW_DATA_CST elements
advance curindex by RAW_DATA_LENGTH or one less than that if
ce->index is non-NULL.  Handle even the first element if
it is RAW_DATA_CST.  Formatting fix.
gcc/testsuite/
* c-c++-common/init-6.c: New test.

c++: printing AGGR_INIT_EXPR args

PR30854 was about wrongly dumping the dummy object argument to a
constructor; r126582 in 4.3 fixed that by skipping the first argument. But
not all functions called by AGGR_INIT_EXPR are constructors, as observed in
PR116634; we shouldn't skip for non-member functions. And let's combine the
printing code for CALL_EXPR and AGGR_INIT_EXPR.

This doesn't make us accept the ill-formed 116634 testcase again with a
pedwarn, just fixes the diagnostic issue.

PR c++/30854
PR c++/116634

gcc/cp/ChangeLog:

* error.cc (dump_aggr_init_expr_args): Remove.
(dump_call_expr_args): Handle AGGR_INIT_EXPR.
(dump_expr): Combine AGGR_INIT_EXPR and CALL_EXPR cases.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C: Adjust
diagnostic.
* g++.dg/diagnostic/aggr-init1.C: New test.

[RISC-V] RISC-V: Add implication for M extension.

That M implies Zmmul.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: M implies Zmmul.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: Add _zmmul1p0 to arch string.
* gcc.target/riscv/attribute-16.c: Ditto.
* gcc.target/riscv/attribute-17.c: Ditto.
* gcc.target/riscv/attribute-18.c: Ditto.
* gcc.target/riscv/attribute-19.c: Ditto.
* gcc.target/riscv/pr110696.c: Ditto.
* gcc.target/riscv/target-attr-01.c: Ditto.
* gcc.target/riscv/target-attr-02.c: Ditto.
* gcc.target/riscv/target-attr-03.c: Ditto.
* gcc.target/riscv/target-attr-04.c: Ditto.
* gcc.target/riscv/target-attr-08.c: Ditto.
* gcc.target/riscv/target-attr-11.c: Ditto.
* gcc.target/riscv/target-attr-14.c: Ditto.
* gcc.target/riscv/target-attr-15.c: Ditto.
* gcc.target/riscv/target-attr-16.c: Ditto.
* gcc.target/riscv/rvv/base/pr114352-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: Likewise.
* gcc.dg/pr90838.c: Fix search string for rv64.

Co-Authored-By: Jeff Law <jlaw@ventanamicro.com>

testcase: Add testcase for tree-optimization/117341

Even though PR 117341 was a duplicate of PR 116768, another
testcase this time C++ does not hurt to have.
The testcase is a self-contained and does not use directly libstdc++
except for operator new (it does not even call delete).

Tested on x86_64-linux-gnu with it working.

PR tree-optimization/117341

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr117341-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH 2/2] RISC-V:Add intrinsic cases for the CMOs extensions

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-32.c: New test.
* gcc.target/riscv/cmo-64.c: New test.

[PATCH 1/2] RISC-V:Add intrinsic support for the CMOs extensions

gcc/ChangeLog:

* config.gcc: Add riscv_cmo.h.
* config/riscv/riscv_cmo.h: New file.

RISC-V: Add testcases for form 1 of MASK_LEN_STRIDED_LOAD{STORE}

Form 1:
  void __attribute__((noinline))                                        \
  vec_strided_load_store_##T##_form_1 (T *restrict out, T *restrict in, \
       long stride, size_t size)        \
  {                                                                     \
    for (size_t i = 0; i < size; i++)                                   \
      out[i * stride] = in[i * stride];                                 \
  }

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add strided folder.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_data.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_run.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

RISC-V: Implement the MASK_LEN_STRIDED_LOAD{STORE}

This patch would like to implment the MASK_LEN_STRIDED_LOAD{STORE} in
the RISC-V backend by leveraging the vector strided load/store insn.

For example:
void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
    for (int i = 0; i < n; i++)
      a[i*stride] = b[i*stride] + 100;
}

Before this patch:
  38   │     vsetvli a5,a3,e32,m1,ta,ma
  39   │     vluxei64.v  v1,(a1),v4
  40   │     mul a4,a2,a5
  41   │     sub a3,a3,a5
  42   │     vadd.vv v1,v1,v2
  43   │     vsuxei64.v  v1,(a0),v4
  44   │     add a1,a1,a4
  45   │     add a0,a0,a4

After this patch:
  33   │     vsetvli a5,a3,e32,m1,ta,ma
  34   │     vlse32.v    v1,0(a1),a2
  35   │     mul a4,a2,a5
  36   │     sub a3,a3,a5
  37   │     vadd.vv v1,v1,v2
  38   │     vsse32.v    v1,0(a0),a2
  39   │     add a1,a1,a4
  40   │     add a0,a0,a4

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (mask_len_strided_load_<mode>): Add
new pattern for MASK_LEN_STRIDED_LOAD.
(mask_len_strided_store_<mode>): Ditto but for store.
* config/riscv/riscv-protos.h (expand_strided_load): Add new
func decl to expand strided load.
(expand_strided_store): Ditto but for store.
* config/riscv/riscv-v.cc (expand_strided_load): Add new
func impl to expand strided load.
(expand_strided_store): Ditto but for store.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

RISC-V: Adjust the gather-scatter testcases due to middle-end change

After we have MASK_LEN_STRIDED_LOAD{STORE} in the middle-end, the
strided case need to be adjust for IR check.

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
Adjust IR for MASK_LEN_LOAD check.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
Ditto but for store.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Vect: Introduce MASK_LEN_STRIDED_LOAD{STORE} to loop vectorizer

This patch would like to allow generation of MASK_LEN_STRIDED_LOAD{STORE} IR
for invariant stride memory access.  For example as below

void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
    for (int i = 0; i < n; i++)
      a[i*stride] = b[i*stride] + 100;
}

Before this patch:
  66   │   _73 = .SELECT_VL (ivtmp_71, POLY_INT_CST [4, 4]);
  67   │   _52 = _54 * _73;
  68   │   vect__5.16_61 = .MASK_LEN_GATHER_LOAD (vectp_b.14_59, _58, 4, { 0, ... }, { -1, ... }, _73, 0);
  69   │   vect__7.17_63 = vect__5.16_61 + { 100, ... };
  70   │   .MASK_LEN_SCATTER_STORE (vectp_a.18_67, _58, 4, vect__7.17_63, { -1, ... }, _73, 0);
  71   │   vectp_b.14_60 = vectp_b.14_59 + _52;
  72   │   vectp_a.18_68 = vectp_a.18_67 + _52;
  73   │   ivtmp_72 = ivtmp_71 - _73;

After this patch:
  60   │   _70 = .SELECT_VL (ivtmp_68, POLY_INT_CST [4, 4]);
  61   │   _52 = _54 * _70;
  62   │   vect__5.16_58 = .MASK_LEN_STRIDED_LOAD (vectp_b.14_56, _55, { 0, ... }, { -1, ... }, _70, 0);
  63   │   vect__7.17_60 = vect__5.16_58 + { 100, ... };
  64   │   .MASK_LEN_STRIDED_STORE (vectp_a.18_64, _55, vect__7.17_60, { -1, ... }, _70, 0);
  65   │   vectp_b.14_57 = vectp_b.14_56 + _52;
  66   │   vectp_a.18_65 = vectp_a.18_64 + _52;
  67   │   ivtmp_69 = ivtmp_68 - _70;

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Handle
MASK_LEN_STRIDED_LOAD{STORE} after supported check.
(vectorizable_store): Generate MASK_LEN_STRIDED_LOAD when the offset
of gater is not vector type.
(vectorizable_load): Ditto but for store.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Internal-fn: Introduce new IFN MASK_LEN_STRIDED_LOAD{STORE}

This patch would like to introduce new IFN for strided load and store.

LOAD:  v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
STORE: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)

The IFN target below code example similar as below

void foo (int * a, int * b, int stride, int n)
{
  for (int i = 0; i < n; i++)
    a[i * stride] = b[i * stride];
}

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* internal-fn.cc (strided_load_direct): Add new define direct
for strided load.
(strided_store_direct): Ditto but for store.
(expand_strided_load_optab_fn): Add new func to expand the IFN
MASK_LEN_STRIDED_LOAD in middle-end.
(expand_strided_store_optab_fn): Ditto but for store.
(direct_strided_load_optab_supported_p): Add define for stride
load optab supported.
(direct_strided_store_optab_supported_p): Ditto but for store.
(internal_fn_len_index): Add strided load/store len index.
(internal_fn_mask_index): Ditto but for mask.
(internal_fn_stored_value_index): Add strided store value index.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Add new IFN for
strided load.
(MASK_LEN_STRIDED_STORE): Ditto but for store.
* optabs.def (mask_len_strided_load_optab): Add strided load optab.
(mask_len_strided_store_optab): Add strided store optab.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Remove dead vect_recog_mixed_size_cond_pattern

vect_recog_mixed_size_cond_pattern only applies to COMPARISON_CLASS_P
rhs1 COND_EXPRs which no longer appear - the following removes it.
Its testcases still pass, I believe the situation is mitigated by
bool pattern handling of the compare use in COND_EXPRs.

* tree-vect-patterns.cc (type_conversion_p): Remove.
(vect_recog_mixed_size_cond_pattern): Likewise.
(vect_vect_recog_func_ptrs): Remove vect_recog_mixed_size_cond_pattern
entry.

Remove dead code in vectorizer pattern recog

The following removes the code path in vect_recog_mask_conversion_pattern
dealing with comparisons in COND_EXPRs. That can no longer happen.

* tree-vect-patterns.cc (vect_recog_mask_conversion_pattern):
Remove COMPARISON_CLASS_P rhs1 of COND_EXPR case and assert
it doesn't happen.

libstdc++: Fix complexity of drop_view::begin() const [PR112641]

Views are required to have a amortized O(1) begin(), but our drop_view's
const begin overload is O(n) for non-common ranges with a non-sized
sentinel. This patch reimplements it so that it's O(1) always. See
also LWG 4009.

PR libstdc++/112641

libstdc++-v3/ChangeLog:

* include/std/ranges (drop_view::begin): Reimplement const
overload so that it's O(1) always.
* testsuite/std/ranges/adaptors/drop.cc (test10): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

jit: fix leak of pending_assemble_externals_set [PR117275]

My recent r15-4580-g779c0390e3b57d fix for resetting state in
varasm.cc introduced some noise to "make selftest-valgrind" and,
presumably, a memory leak in libgccjit:

==2462086== 160 (56 direct, 104 indirect) bytes in 1 blocks are definitely lost in loss record 248 of 352
==2462086==    at 0x5270E7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==2462086==    by 0x1D1EB89: init_varasm_once() (varasm.cc:6806)
==2462086==    by 0x181C845: backend_init() (toplev.cc:1826)
==2462086==    by 0x181D41A: do_compile() (toplev.cc:2193)
==2462086==    by 0x181D99C: toplev::main(int, char**) (toplev.cc:2371)
==2462086==    by 0x378391D: main (main.cc:39)

Fixed thusly.

gcc/ChangeLog:
PR jit/117275
* varasm.cc (process_pending_assemble_externals): Reset
pending_assemble_externals_set to nullptr after deleting it.
(varasm_cc_finalize): Delete pending_assemble_externals_set.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

tree-optimization/117343 - decide_masked_load_lanes and stale graph

It turns out decide_masked_load_lanes accesses a stale SLP graph
so the following re-builds it instead.

PR tree-optimization/117343
* tree-vect-slp.cc (vect_optimize_slp_pass::build_vertices):
Support re-building the SLP graph.
(vect_optimize_slp_pass::run): Re-build the SLP graph before
decide_masked_load_lanes.

tree-optimization/117333 - ICE with NULL access size DR

dr_may_alias_p ICEs when TYPE_SIZE of DR->ref is NULL but this is
valid IL when the access size of an aggregate copy can be infered
from the RHS.

PR tree-optimization/117333
* tree-data-ref.cc (dr_may_alias_p): Guard against NULL
access size.

* gcc.dg/torture/pr117333.c: New testcase.

libstdc++: Use if consteval rather than if (std::__is_constant_evaluated()) for {,b}float16_t nextafter [PR117321]

The nextafter_c++23.cc testcase fails to link at -O0.
The problem is that eventhough std::__is_constant_evaluated() has
always_inline attribute, that at -O0 just means that we inline the
call, but its result is still assigned to a temporary which is tested
later, nothing at -O0 propagates that false into the if and optimizes
away the if body.  And the __builtin_nextafterf16{,b} calls are meant
to be used solely for constant evaluation, the C libraries don't
define nextafterf16 these days.

As __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ are predefined right
now only by GCC, not by clang which doesn't implement the extended floating
point types paper, and as they are predefined in C++23 and later modes only,
I think we can just use if consteval which is folded already during the FE
and the body isn't included even at -O0.  I've added a feature test for
that just in case clang implements those and implements those in some weird
way.  Note, if (__builtin_is_constant_evaluted()) would work correctly too,
that is also folded to false at gimplification time and the corresponding
if block not emitted at all.  But for -O0 it can't be wrapped into a helper
inline function.

2024-10-29  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117321
* include/c_global/cmath (nextafter(_Float16, _Float16)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16 call.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16b call.
* testsuite/26_numerics/headers/cmath/117321.cc: New test.

ada: Fix static_assert with one argument

Single argument static_assert is C++17 only and breaks the build using
older GCC (prerequisite is C++14).

gcc/ada

* types.h: fix static_assert.

arm: [MVE intrinsics] Rework MVE vld/vst intrinsics

Implement the mve vld and vst intrinsics using the MVE builtins framework.

The main part of the patch is to reimplement to vstr/vldr patterns
such that we now have much fewer of them:
- non-truncating stores
- predicated non-truncating stores
- truncating stores
- predicated truncating stores
- non-extending loads
- predicated non-extending loads
- extending loads
- predicated extending loads

This enables us to update the implementation of vld1/vst1 and use the
new vldr/vstr builtins.

The patch also adds support for the predicated vld1/vst1 versions.

gcc.target/arm/pr112337.c needs an update, to call the intrinsic
instead of the builtin, which this patch deletes.

2024-09-11 Alfie Richards <Alfie.Richards@arm.com>
Christophe Lyon <christophe.lyon@arm.com>

gcc/

* config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support
for predicated version.
(vst1q_impl): Likewise.
(vstrq_impl): New class.
(vldrq_impl): New class.
(vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
* config/arm/arm-mve-builtins-base.def (vld1q): Add predicated
version.
(vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vst1q): Add predicated version.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
(vrev32q): Update types to float_16.
* config/arm/arm-mve-builtins-base.h (vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
* config/arm/arm-mve-builtins-functions.h (memory_vector_mode):
Remove conversion of floating point vectors to integer.
* config/arm/arm-mve-builtins.cc (TYPES_float16): Change to...
(TYPES_float_16): ...this.
(TYPES_float_32): New.
(float16): Change to...
(float_16): ...this.
(float_32): New.
(preds_z_or_none): New.
(function_resolver::check_gp_argument): Add support for _z
predicate.
* config/arm/arm_mve.h (vstrbq): Remove.
(vstrbq_p): Likewise.
(vstrhq): Likewise.
(vstrhq_p): Likewise.
(vstrwq): Likewise.
(vstrwq_p): Likewise.
(vst1q_p): Likewise.
(vld1q_z): Likewise.
(vldrbq_s8): Likewise.
(vldrbq_u8): Likewise.
(vldrbq_s16): Likewise.
(vldrbq_u16): Likewise.
(vldrbq_s32): Likewise.
(vldrbq_u32): Likewise.
(vstrbq_s8): Likewise.
(vstrbq_s32): Likewise.
(vstrbq_s16): Likewise.
(vstrbq_u8): Likewise.
(vstrbq_u32): Likewise.
(vstrbq_u16): Likewise.
(vstrbq_p_s8): Likewise.
(vstrbq_p_s32): Likewise.
(vstrbq_p_s16): Likewise.
(vstrbq_p_u8): Likewise.
(vstrbq_p_u32): Likewise.
(vstrbq_p_u16): Likewise.
(vldrbq_z_s16): Likewise.
(vldrbq_z_u8): Likewise.
(vldrbq_z_s8): Likewise.
(vldrbq_z_s32): Likewise.
(vldrbq_z_u16): Likewise.
(vldrbq_z_u32): Likewise.
(vldrhq_s32): Likewise.
(vldrhq_s16): Likewise.
(vldrhq_u32): Likewise.
(vldrhq_u16): Likewise.
(vldrhq_z_s32): Likewise.
(vldrhq_z_s16): Likewise.
(vldrhq_z_u32): Likewise.
(vldrhq_z_u16): Likewise.
(vldrwq_s32): Likewise.
(vldrwq_u32): Likewise.
(vldrwq_z_s32): Likewise.
(vldrwq_z_u32): Likewise.
(vldrhq_f16): Likewise.
(vldrhq_z_f16): Likewise.
(vldrwq_f32): Likewise.
(vldrwq_z_f32): Likewise.
(vstrhq_f16): Likewise.
(vstrhq_s32): Likewise.
(vstrhq_s16): Likewise.
(vstrhq_u32): Likewise.
(vstrhq_u16): Likewise.
(vstrhq_p_f16): Likewise.
(vstrhq_p_s32): Likewise.
(vstrhq_p_s16): Likewise.
(vstrhq_p_u32): Likewise.
(vstrhq_p_u16): Likewise.
(vstrwq_f32): Likewise.
(vstrwq_s32): Likewise.
(vstrwq_u32): Likewise.
(vstrwq_p_f32): Likewise.
(vstrwq_p_s32): Likewise.
(vstrwq_p_u32): Likewise.
(vst1q_p_u8): Likewise.
(vst1q_p_s8): Likewise.
(vld1q_z_u8): Likewise.
(vld1q_z_s8): Likewise.
(vst1q_p_u16): Likewise.
(vst1q_p_s16): Likewise.
(vld1q_z_u16): Likewise.
(vld1q_z_s16): Likewise.
(vst1q_p_u32): Likewise.
(vst1q_p_s32): Likewise.
(vld1q_z_u32): Likewise.
(vld1q_z_s32): Likewise.
(vld1q_z_f16): Likewise.
(vst1q_p_f16): Likewise.
(vld1q_z_f32): Likewise.
(vst1q_p_f32): Likewise.
(__arm_vstrbq_s8): Likewise.
(__arm_vstrbq_s32): Likewise.
(__arm_vstrbq_s16): Likewise.
(__arm_vstrbq_u8): Likewise.
(__arm_vstrbq_u32): Likewise.
(__arm_vstrbq_u16): Likewise.
(__arm_vldrbq_s8): Likewise.
(__arm_vldrbq_u8): Likewise.
(__arm_vldrbq_s16): Likewise.
(__arm_vldrbq_u16): Likewise.
(__arm_vldrbq_s32): Likewise.
(__arm_vldrbq_u32): Likewise.
(__arm_vstrbq_p_s8): Likewise.
(__arm_vstrbq_p_s32): Likewise.
(__arm_vstrbq_p_s16): Likewise.
(__arm_vstrbq_p_u8): Likewise.
(__arm_vstrbq_p_u32): Likewise.
(__arm_vstrbq_p_u16): Likewise.
(__arm_vldrbq_z_s8): Likewise.
(__arm_vldrbq_z_s32): Likewise.
(__arm_vldrbq_z_s16): Likewise.
(__arm_vldrbq_z_u8): Likewise.
(__arm_vldrbq_z_u32): Likewise.
(__arm_vldrbq_z_u16): Likewise.
(__arm_vldrhq_s32): Likewise.
(__arm_vldrhq_s16): Likewise.
(__arm_vldrhq_u32): Likewise.
(__arm_vldrhq_u16): Likewise.
(__arm_vldrhq_z_s32): Likewise.
(__arm_vldrhq_z_s16): Likewise.
(__arm_vldrhq_z_u32): Likewise.
(__arm_vldrhq_z_u16): Likewise.
(__arm_vldrwq_s32): Likewise.
(__arm_vldrwq_u32): Likewise.
(__arm_vldrwq_z_s32): Likewise.
(__arm_vldrwq_z_u32): Likewise.
(__arm_vstrhq_s32): Likewise.
(__arm_vstrhq_s16): Likewise.
(__arm_vstrhq_u32): Likewise.
(__arm_vstrhq_u16): Likewise.
(__arm_vstrhq_p_s32): Likewise.
(__arm_vstrhq_p_s16): Likewise.
(__arm_vstrhq_p_u32): Likewise.
(__arm_vstrhq_p_u16): Likewise.
(__arm_vstrwq_s32): Likewise.
(__arm_vstrwq_u32): Likewise.
(__arm_vstrwq_p_s32): Likewise.
(__arm_vstrwq_p_u32): Likewise.
(__arm_vst1q_p_u8): Likewise.
(__arm_vst1q_p_s8): Likewise.
(__arm_vld1q_z_u8): Likewise.
(__arm_vld1q_z_s8): Likewise.
(__arm_vst1q_p_u16): Likewise.
(__arm_vst1q_p_s16): Likewise.
(__arm_vld1q_z_u16): Likewise.
(__arm_vld1q_z_s16): Likewise.
(__arm_vst1q_p_u32): Likewise.
(__arm_vst1q_p_s32): Likewise.
(__arm_vld1q_z_u32): Likewise.
(__arm_vld1q_z_s32): Likewise.
(__arm_vldrwq_f32): Likewise.
(__arm_vldrwq_z_f32): Likewise.
(__arm_vldrhq_z_f16): Likewise.
(__arm_vldrhq_f16): Likewise.
(__arm_vstrwq_p_f32): Likewise.
(__arm_vstrwq_f32): Likewise.
(__arm_vstrhq_f16): Likewise.
(__arm_vstrhq_p_f16): Likewise.
(__arm_vld1q_z_f16): Likewise.
(__arm_vst1q_p_f16): Likewise.
(__arm_vld1q_z_f32): Likewise.
(__arm_vst2q_f32): Likewise.
(__arm_vst1q_p_f32): Likewise.
(__arm_vstrbq): Likewise.
(__arm_vstrbq_p): Likewise.
(__arm_vstrhq): Likewise.
(__arm_vstrhq_p): Likewise.
(__arm_vstrwq): Likewise.
(__arm_vstrwq_p): Likewise.
(__arm_vst1q_p): Likewise.
(__arm_vld1q_z): Likewise.
* config/arm/arm_mve_builtins.def:
(vstrbq_s): Delete.
(vstrbq_u): Likewise.
(vldrbq_s): Likewise.
(vldrbq_u): Likewise.
(vstrbq_p_s): Likewise.
(vstrbq_p_u): Likewise.
(vldrbq_z_s): Likewise.
(vldrbq_z_u): Likewise.
(vld1q_u): Likewise.
(vld1q_s): Likewise.
(vldrhq_z_u): Likewise.
(vldrhq_u): Likewise.
(vldrhq_z_s): Likewise.
(vldrhq_s): Likewise.
(vld1q_f): Likewise.
(vldrhq_f): Likewise.
(vldrhq_z_f): Likewise.
(vldrwq_f): Likewise.
(vldrwq_s): Likewise.
(vldrwq_u): Likewise.
(vldrwq_z_f): Likewise.
(vldrwq_z_s): Likewise.
(vldrwq_z_u): Likewise.
(vst1q_u): Likewise.
(vst1q_s): Likewise.
(vstrhq_p_u): Likewise.
(vstrhq_u): Likewise.
(vstrhq_p_s): Likewise.
(vstrhq_s): Likewise.
(vst1q_f): Likewise.
(vstrhq_f): Likewise.
(vstrhq_p_f): Likewise.
(vstrwq_f): Likewise.
(vstrwq_s): Likewise.
(vstrwq_u): Likewise.
(vstrwq_p_f): Likewise.
(vstrwq_p_s): Likewise.
(vstrwq_p_u): Likewise.
* config/arm/iterators.md (MVE_w_narrow_TYPE): New iterator.
(MVE_w_narrow_type): New iterator.
(MVE_wide_n_TYPE): New attribute.
(MVE_wide_n_type): New attribute.
(MVE_wide_n_sz_elem): New attribute.
(MVE_wide_n_VPRED): New attribute.
(MVE_elem_ch): New attribute.
(supf): Remove VSTRBQ_S, VSTRBQ_U, VLDRBQ_S, VLDRBQ_U, VLD1Q_S,
VLD1Q_U, VLDRHQ_S, VLDRHQ_U, VLDRWQ_S, VLDRWQ_U, VST1Q_S, VST1Q_U,
VSTRHQ_S, VSTRHQ_U, VSTRWQ_S, VSTRWQ_U.
(VSTRBQ, VLDRBQ, VLD1Q, VLDRHQ, VLDRWQ, VST1Q, VSTRHQ, VSTRWQ):
Delete.
* config/arm/mve.md (mve_vstrbq_<supf><mode>): Remove.
(mve_vldrbq_<supf><mode>): Likewise.
(mve_vstrbq_p_<supf><mode>): Likewise.
(mve_vldrbq_z_<supf><mode>): Likewise.
(mve_vldrhq_fv8hf): Likewise.
(mve_vldrhq_<supf><mode>): Likewise.
(mve_vldrhq_z_fv8hf): Likewise.
(mve_vldrhq_z_<supf><mode>): Likewise.
(mve_vldrwq_fv4sf): Likewise.
(mve_vldrwq_<supf>v4si): Likewise.
(mve_vldrwq_z_fv4sf): Likewise.
(mve_vldrwq_z_<supf>v4si): Likewise.
(@mve_vld1q_f<mode>): Likewise.
(@mve_vld1q_<supf><mode>): Likewise.
(mve_vstrhq_fv8hf): Likewise.
(mve_vstrhq_p_fv8hf): Likewise.
(mve_vstrhq_p_<supf><mode>): Likewise.
(mve_vstrhq_<supf><mode>): Likewise.
(mve_vstrwq_fv4sf): Likewise.
(mve_vstrwq_p_fv4sf): Likewise.
(mve_vstrwq_p_<supf>v4si): Likewise.
(mve_vstrwq_<supf>v4si): Likewise.
(@mve_vst1q_f<mode>): Likewise.
(@mve_vst1q_<supf><mode>): Likewise.
(@mve_vstrq_<mode>): New.
(@mve_vstrq_p_<mode>): New.
(@mve_vstrq_truncate_<mode>): New.
(@mve_vstrq_p_truncate_<mode>): New.
(@mve_vldrq_<mode>): New.
(@mve_vldrq_z_<mode>): New.
(@mve_vldrq_extend_<mode><US>): New.
(@mve_vldrq_z_extend_<mode><US>): New.
* config/arm/unspecs.md:
(VSTRBQ_S): Remove.
(VSTRBQ_U): Likewise.
(VLDRBQ_S): Likewise.
(VLDRBQ_U): Likewise.
(VLD1Q_F): Likewise.
(VLD1Q_S): Likewise.
(VLD1Q_U): Likewise.
(VLDRHQ_F): Likewise.
(VLDRHQ_U): Likewise.
(VLDRHQ_S): Likewise.
(VLDRWQ_F): Likewise.
(VLDRWQ_S): Likewise.
(VLDRWQ_U): Likewise.
(VSTRHQ_F): Likewise.
(VST1Q_S): Likewise.
(VST1Q_U): Likewise.
(VSTRHQ_U): Likewise.
(VSTRWQ_S): Likewise.
(VSTRWQ_U): Likewise.
(VSTRWQ_F): Likewise.
(VST1Q_F): Likewise.
(VLDRQ): New.
(VLDRQ_Z): Likewise.
(VLDRQ_EXT): Likewise.
(VLDRQ_EXT_Z): Likewise.
(VSTRQ): Likewise.
(VSTRQ_P): Likewise.
(VSTRQ_TRUNC): Likewise.
(VSTRQ_TRUNC_P): Likewise.

gcc/testsuite/
* gcc.target/arm/pr112337.c: Call intrinsic instead of builtin.