git.ipfire.org Git - thirdparty/gcc.git/log

Support -mevex512 for AVX512F intrins

gcc/ChangeLog:

* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Disable 512 bit gather
when !TARGET_EVEX512.
* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
Add TARGET_EVEX512.
(ix86_expand_int_sse_cmp): Ditto.
(ix86_expand_vector_init_one_nonzero): Disable subroutine
when !TARGET_EVEX512.
(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
(ix86_vectorize_vec_perm_const): Disable subroutine when
!TARGET_EVEX512.
* config/i386/i386.cc
(standard_sse_constant_p): Add TARGET_EVEX512.
(standard_sse_constant_opcode): Ditto.
(ix86_get_ssemov): Ditto.
(ix86_legitimate_constant_p): Ditto.
(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
when !TARGET_EVEX512.
* config/i386/i386.md (avx512f_512): New.
(movxi): Add TARGET_EVEX512.
(*movxi_internal_avx512f): Ditto.
(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
for alternative 13.
(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
alternative 9.
(*movhi_internal): Change alternative 11 to *Yv.
(*movdf_internal): Change alternative 12 to Yv.
(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
alternative 5 and 6.
(*mov<mode>_internal): Change alternative 4 to Yv.
(define_split for convert SF to DF): Add TARGET_EVEX512.
(extendbfsf2_1): Ditto.
* config/i386/predicates.md (bcst_mem_operand): Disable predicate
for 512 bit when !TARGET_EVEX512.
* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
(V48_AVX512VL): Ditto.
(V48_256_512_AVX512VL): Ditto.
(V48H_AVX512VL): Ditto.
(VI12_AVX512VL): Ditto.
(V): Ditto.
(V_512): Ditto.
(V_256_512): Ditto.
(VF): Ditto.
(VF1_VF2_AVX512DQ): Ditto.
(VFH): Ditto.
(VFB): Ditto.
(VF1): Ditto.
(VF1_AVX2): Ditto.
(VF2): Ditto.
(VF2H): Ditto.
(VF2_512_256): Ditto.
(VF2_512_256VL): Ditto.
(VF_512): Ditto.
(VFB_512): Ditto.
(VI48_AVX512VL): Ditto.
(VI1248_AVX512VLBW): Ditto.
(VF_AVX512VL): Ditto.
(VFH_AVX512VL): Ditto.
(VF1_AVX512VL): Ditto.
(VI): Ditto.
(VIHFBF): Ditto.
(VI_AVX2): Ditto.
(VI8): Ditto.
(VI8_AVX512VL): Ditto.
(VI2_AVX512F): Ditto.
(VI4_AVX512F): Ditto.
(VI4_AVX512VL): Ditto.
(VI48_AVX512F_AVX512VL): Ditto.
(VI8_AVX2_AVX512F): Ditto.
(VI8_AVX_AVX512F): Ditto.
(V8FI): Ditto.
(V16FI): Ditto.
(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
(VI248_AVX512VLBW): Ditto.
(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
(VI248_AVX512BW): Ditto.
(VI248_AVX512BW_AVX512VL): Ditto.
(VI48_AVX512F): Ditto.
(VI48_AVX_AVX512F): Ditto.
(VI12_AVX_AVX512F): Ditto.
(VI148_512): Ditto.
(VI124_256_AVX512F_AVX512BW): Ditto.
(VI48_512): Ditto.
(VI_AVX512BW): Ditto.
(VIHFBF_AVX512BW): Ditto.
(VI4F_256_512): Ditto.
(VI48F_256_512): Ditto.
(VI48F): Ditto.
(VI12_VI48F_AVX512VL): Ditto.
(V32_512): Ditto.
(AVX512MODE2P): Ditto.
(STORENT_MODE): Ditto.
(REDUC_PLUS_MODE): Ditto.
(REDUC_SMINMAX_MODE): Ditto.
(*andnot<mode>3): Change isa attribute to avx512f_512.
(*andnot<mode>3): Ditto.
(<code><mode>3): Ditto.
(<code>tf3): Ditto.
(FMAMODEM): Add TARGET_EVEX512.
(FMAMODE_AVX512): Ditto.
(VFH_SF_AVX512VL): Ditto.
(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
Ditto.
(avx512f_cvtdq2pd512_2): Ditto.
(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
Ditto.
(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
(vec_unpacks_lo_v16sf): Ditto.
(vec_unpacks_hi_v16sf): Ditto.
(vec_unpacks_float_hi_v16si): Ditto.
(vec_unpacks_float_lo_v16si): Ditto.
(vec_unpacku_float_hi_v16si): Ditto.
(vec_unpacku_float_lo_v16si): Ditto.
(vec_pack_sfix_trunc_v8df): Ditto.
(avx512f_vec_pack_sfix_v8df): Ditto.
(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
(AVX512_VEC): Ditto.
(AVX512_VEC_2): Ditto.
(vec_extract_lo_v64qi): Ditto.
(vec_extract_hi_v64qi): Ditto.
(VEC_EXTRACT_MODE): Ditto.
(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
(avx512f_movddup512<mask_name>): Ditto.
(avx512f_unpcklpd512<mask_name>): Ditto.
(*<avx512>_vternlog<mode>_all): Ditto.
(*<avx512>_vpternlog<mode>_1): Ditto.
(*<avx512>_vpternlog<mode>_2): Ditto.
(*<avx512>_vpternlog<mode>_3): Ditto.
(avx512f_shufps512_mask): Ditto.
(avx512f_shufps512_1<mask_name>): Ditto.
(avx512f_shufpd512_mask): Ditto.
(avx512f_shufpd512_1<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
(vec_dupv2df<mask_name>): Ditto.
(trunc<pmov_src_lower><mode>2): Ditto.
(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
(*avx512f_vpermvar_truncv8div8si_1): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
(truncv8div8qi2): Ditto.
(avx512f_<code>v8div16qi2): Ditto.
(*avx512f_<code>v8div16qi2_store_1): Ditto.
(*avx512f_<code>v8div16qi2_store_2): Ditto.
(avx512f_<code>v8div16qi2_mask): Ditto.
(*avx512f_<code>v8div16qi2_mask_1): Ditto.
(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
(vec_widen_umult_even_v16si<mask_name>): Ditto.
(*vec_widen_umult_even_v16si<mask_name>): Ditto.
(vec_widen_smult_even_v16si<mask_name>): Ditto.
(*vec_widen_smult_even_v16si<mask_name>): Ditto.
(VEC_PERM_AVX2): Ditto.
(one_cmpl<mode>2): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
(define_split to xor): Ditto.
(*andnot<mode>3): Ditto.
(define_split for ior): Ditto.
(*iornot<mode>3): Ditto.
(*xnor<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
(avx512f_pshufdv3_mask): Ditto.
(avx512f_pshufd_1<mask_name>): Ditto.
(*vec_extractv4ti): Ditto.
(VEXTRACTI128_MODE): Ditto.
(define_split to vec_extract): Ditto.
(VI1248_AVX512VL_AVX512BW): Ditto.
(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
(<insn>v16qiv16si2): Ditto.
(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
(<insn>v16hiv16si2): Ditto.
(avx512f_zero_extendv16hiv16si2_1): Ditto.
(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
(<insn>v8qiv8di2): Ditto.
(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
(<insn>v8hiv8di2): Ditto.
(avx512f_<code>v8siv8di2<mask_name>): Ditto.
(*avx512f_zero_extendv8siv8di2_1): Ditto.
(*avx512f_zero_extendv8siv8di2_2): Ditto.
(<insn>v8siv8di2): Ditto.
(avx512f_roundps512_sfix): Ditto.
(vashrv8di3): Ditto.
(vashrv16si3): Ditto.
(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
(vec_dupv4sf): Add TARGET_EVEX512.
(*vec_dupv4si): Ditto.
(*vec_dupv2di): Ditto.
(vec_dup<mode>): Change isa attribute to avx512f_512.
(VPERMI2): Add TARGET_EVEX512.
(VPERMI2I): Ditto.
(VEC_INIT_MODE): Ditto.
(VEC_INIT_HALF_MODE): Ditto.
(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
Ditto.
(avx512f_vcvtps2ph512_mask_sae): Ditto.
(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
Ditto.
(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
(INT_BROADCAST_MODE): Ditto.

Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Disable zmm broadcast for !TARGET_EVEX512.
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not use PVW_512 when no-evex512.
(ix86_simd_clone_adjust): Add evex512 target into string.
* config/i386/i386.cc (type_natural_mode): Report ABI warning
when using zmm register w/o evex512.
(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_set_reg_reg_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_mode_supported_p): Ditto.
(ix86_preferred_simd_mode): Ditto.
(ix86_get_mask_mode): Ditto.
(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
libmvec call when !TARGET_EVEX512.
(ix86_simd_clone_usable): Ditto.
* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
when !TARGET_EVEX512
(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
(STORE_MAX_PIECES): Ditto.

[PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
* config/i386/i386-builtins.cc
(ix86_init_mmx_sse_builtins): Ditto.

[PATCH 5/5] Push evex512 target for 512 bit intrins

gcc/Changelog:

* config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit
intrins.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

[PATCH 4/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config.gcc: Add avx512bitalgvlintrin.h.
* config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit
intrins.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/avx512bf16intrin.h: Ditto.
* config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit
intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h.
* config/i386/avx512erintrin.h: Add evex512 target for 512 bit
intrins
* config/i386/avx512ifmaintrin.h: Ditto
* config/i386/avx512pfintrin.h: Ditto
* config/i386/avx512vbmi2intrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vnniintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vpopcntdqintrin.h: Ditto.
* config/i386/gfniintrin.h: Ditto.
* config/i386/immintrin.h: Add avx512bitalgvlintrin.h.
* config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins.
* config/i386/vpclmulqdqintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: New.

[PATCH 4/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512bwintrin.h: Add evex512 target for 512 bit
intrins.

[PATCH 2/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512dqintrin.h: Add evex512 target for 512 bit
intrins.

[PATCH 1/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512fintrin.h: Add evex512 target for 512 bit intrins.

Initial support for -mevex512

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): New.
(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
(ix86_handle_option): Handle EVEX512.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__
when AVX512VL is set.
* config/i386/i386-options.cc: (isa2_opts): Handle EVEX512.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Set EVEX512 target if it is not
explicitly set when AVX512 is enabled. Disable
AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.

TEST: Fix dump FAIL for RVV (RISCV-V vector)

As this showed: https://godbolt.org/z/3K9oK7fx3

ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.

This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis.
Then we succeed at the second time.

So RVV has 4 times of showing "FOLD_EXTRACT_LAST:.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant.

rs6000: support 32bit inline lrint

gcc/
PR target/88558
* config/rs6000/rs6000.md (lrint<mode>di2): Remove TARGET_FPRND
from insn condition.
(lrint<mode>si2): New insn pattern for 32bit lrint.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr88558.h: New.
* gcc.target/powerpc/pr88558-p7.c: New.
* gcc.target/powerpc/pr88558-p8.c: New.

rs6000: enable SImode in FP register on P7

gcc/
PR target/88558
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Enable SImode on FP registers for P7.
* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
move between FP registers. Set attribute isa of stfiwx to "*"
and attribute of stxsiwx to "p7".

s390: Make use of new copysign RTL

gcc/ChangeLog:

* config/s390/s390.md: Make use of new copysign RTL.

[i386] APX EGPR: fix missing patterns that prohibit egpr

For some pattern m/Bm constraint in alternative 0 and 1 could result in
egpr allocated on memory operand under -mapxf. Should use jm/ja instead.

gcc/ChangeLog:

* config/i386/sse.md (vec_concatv2di): Replace constraint "m"
with "jm" for alternative 0 and 1 of operand 2.
(sse4_1_<code><mode>3<mask_name>): Replace constraint "Bm" with
"ja" for alternative 0 and 1 of operand2.

Daily bump.

libcpp: eliminate LINEMAPS_{ORDINARY,MACRO}_MAPS

libcpp/ChangeLog:
* include/line-map.h (LINEMAPS_ORDINARY_MAPS): Delete.
(LINEMAPS_MACRO_MAPS): Delete.
* line-map.cc (linemap_tracks_macro_expansion_locs_p): Update for
deletion of LINEMAPS_MACRO_MAPS.
(linemap_get_statistics): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate LINEMAPS_{,ORDINARY_,MACRO_}CACHE

It's simpler to use field access than to go through these inline
functions that look as if they are macros.

No functional change intended.

libcpp/ChangeLog:
* include/line-map.h (maps_info_ordinary::cache): Rename to...
(maps_info_ordinary::m_cache): ...this.
(maps_info_macro::cache): Rename to...
(maps_info_macro::m_cache): ...this.
(LINEMAPS_CACHE): Delete.
(LINEMAPS_ORDINARY_CACHE): Delete.
(LINEMAPS_MACRO_CACHE): Delete.
* init.cc (read_original_filename): Update for adding "m_" prefix.
* line-map.cc (linemap_add): Eliminate LINEMAPS_ORDINARY_CACHE in
favor of a simple field access.
(linemap_enter_macro): Likewise for LINEMAPS_MACRO_CACHE.
(linemap_ordinary_map_lookup): Likewise for
LINEMAPS_ORDINARY_CACHE, twice.
(linemap_lookup_macro_index): Likewise for LINEMAPS_MACRO_CACHE.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate LINEMAPS_LAST_ALLOCATED{,_ORDINARY,_MACRO}_MAP

Nothing uses these; delete them.

libcpp/ChangeLog:
* include/line-map.h (LINEMAPS_LAST_ALLOCATED_MAP): Delete.
(LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Delete.
(LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Delete.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: improvements to out-of-bounds diagrams [PR111155]

Update out-of-bounds diagrams to show existing string values,
and the initial write index within a string buffer.

For example, given the out-of-bounds write in strcat in:

void test (void)
{
  char buf[10];
  strcpy (buf, "hello");
  strcat (buf, " world!");
}

the diagram improves from:

                           ┌─────┬─────┬────┬────┬────┐┌─────┬─────┬─────┐
                           │ [0] │ [1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │
                           ├─────┼─────┼────┼────┼────┤├─────┼─────┼─────┤
                           │ ' ' │ 'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
                           ├─────┴─────┴────┴────┴────┴┴─────┴─────┴─────┤
                           │      string literal (type: 'char[8]')       │
                           └─────────────────────────────────────────────┘
                              │     │    │    │    │      │     │     │
                              │     │    │    │    │      │     │     │
                              v     v    v    v    v      v     v     v
  ┌─────┬────────────────────────────────────────┬────┐┌─────────────────┐
  │ [0] │                  ...                   │[9] ││                 │
  ├─────┴────────────────────────────────────────┴────┤│after valid range│
  │             'buf' (type: 'char[10]')              ││                 │
  └───────────────────────────────────────────────────┘└─────────────────┘
  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
                            │                                   │
                  ╭─────────┴────────╮                ╭─────────┴─────────╮
                  │capacity: 10 bytes│                │overflow of 3 bytes│
                  ╰──────────────────╯                ╰───────────────────╯

to:

                             ┌────┬────┬────┬────┬────┐┌─────┬─────┬─────┐
                             │[0] │[1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │
                             ├────┼────┼────┼────┼────┤├─────┼─────┼─────┤
                             │' ' │'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
                             ├────┴────┴────┴────┴────┴┴─────┴─────┴─────┤
                             │     string literal (type: 'char[8]')      │
                             └───────────────────────────────────────────┘
                               │    │    │    │    │      │     │     │
                               │    │    │    │    │      │     │     │
                               v    v    v    v    v      v     v     v
  ┌─────┬────────────────────┬────┬──────────────┬────┐┌─────────────────┐
  │ [0] │        ...         │[5] │     ...      │[9] ││                 │
  ├─────┼────┬────┬────┬────┬┼────┼──────────────┴────┘│                 │
  │ 'h' │'e' │'l' │'l' │'o' ││NUL │                    │after valid range│
  ├─────┴────┴────┴────┴────┴┴────┴───────────────────┐│                 │
  │             'buf' (type: 'char[10]')              ││                 │
  └───────────────────────────────────────────────────┘└─────────────────┘
  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
                            │                                   │
                  ╭─────────┴────────╮                ╭─────────┴─────────╮
                  │capacity: 10 bytes│                │overflow of 3 bytes│
                  ╰──────────────────╯                ╰───────────────────╯

gcc/analyzer/ChangeLog:
PR analyzer/111155
* access-diagram.cc (boundaries::boundaries): Add logger param
(boundaries::add): Add logging.
(boundaries::get_hard_boundaries_in_range): New.
(boundaries::m_logger): New field.
(boundaries::get_table_x_for_offset): Make public.
(class svalue_spatial_item): New.
(class compound_svalue_spatial_item): New.
(add_ellipsis_to_gaps): New.
(valid_region_spatial_item::valid_region_spatial_item): Add theme
param.  Initialize m_boundaries, m_existing_sval, and
m_existing_sval_spatial_item.
(valid_region_spatial_item::add_boundaries): Set m_boundaries.
Add boundaries for any m_existing_sval_spatial_item.
(valid_region_spatial_item::add_array_elements_to_table): Rewrite
creation of min/max index in terms of
maybe_add_array_index_to_table.  Rewrite ellipsis code using
add_ellipsis_to_gaps. Add index values for any hard boundaries
within the valid region.
(valid_region_spatial_item::maybe_add_array_index_to_table): New,
based on code formerly in add_array_elements_to_table.
(valid_region_spatial_item::make_table): Make use of
m_existing_sval_spatial_item, if any.
(valid_region_spatial_item::m_boundaries): New field.
(valid_region_spatial_item::m_existing_sval): New field.
(valid_region_spatial_item::m_existing_sval_spatial_item): New
field.
(class svalue_spatial_item): Rename to...
(class written_svalue_spatial_item): ...this.
(class string_region_spatial_item): Rename to..
(class string_literal_spatial_item): ...this.  Add "kind".
(string_literal_spatial_item::add_boundaries): Use m_kind to
determine kind of boundary.  Update for renaming of m_actual_bits
to m_bits.
(string_literal_spatial_item::make_table): Likewise.  Support not
displaying a row for byte indexes, and not displaying a row for
the type.
(string_literal_spatial_item::add_column_for_byte): Make byte index
row optional.
(svalue_spatial_item::make): Convert to...
(make_written_svalue_spatial_item): ...this.
(make_existing_svalue_spatial_item): New.
(access_diagram_impl::access_diagram_impl): Pass theme to
m_valid_region_spatial_item ctor.  Update for renaming of
m_svalue_spatial_item.
(access_diagram_impl::find_boundaries): Pass logger to boundaries.
Update for renaming of...
(access_diagram_impl::m_svalue_spatial_item): Rename to...
(access_diagram_impl::m_written_svalue_spatial_item): ...this.

gcc/testsuite/ChangeLog:
PR analyzer/111155
* c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c: New test.
* c-c++-common/analyzer/out-of-bounds-diagram-strcat.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-17.c: Update expected
result to show the existing content of "buf" and the index at
which the write starts.
* gcc.dg/analyzer/out-of-bounds-diagram-18.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-19.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-6.c: Update expected
output.

gcc/ChangeLog:
PR analyzer/111155
* text-art/table.cc (table::maybe_set_cell_span): New.
(table::add_other_table): New.
* text-art/table.h (class table::cell_placement): Add class table
as a friend.
(table::add_rows): New.
(table::add_row): Reimplement in terms of add_rows.
(table::maybe_set_cell_span): New decl.
(table::add_other_table): New decl.
* text-art/types.h (operator+): New operator for rect + coord.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate COMBINE_LOCATION_DATA

This patch eliminates the function "COMBINE_LOCATION_DATA" (which hasn't
been a macro since r6-739-g0501dbd932a7e9) and the function
"get_combined_adhoc_loc" in favor of a new
line_maps::get_or_create_combined_loc member function.

No functional change intended.

gcc/cp/ChangeLog:
* module.cc (module_state::read_location): Update for renaming of
get_combined_adhoc_loc.

gcc/ChangeLog:
* genmatch.cc (main): Update for "m_" prefix of some fields of
line_maps.
* input.cc (make_location): Update for removal of
COMBINE_LOCATION_DATA.
(dump_line_table_statistics): Update for "m_" prefix of some
fields of line_maps.
(location_with_discriminator): Update for removal of
COMBINE_LOCATION_DATA.
(line_table_test::line_table_test): Update for "m_" prefix of some
fields of line_maps.
* toplev.cc (general_init): Likewise.
* tree.cc (set_block): Update for removal of
COMBINE_LOCATION_DATA.
(set_source_range): Likewise.

libcpp/ChangeLog:
* include/line-map.h (line_maps::reallocator): Rename to...
(line_maps::m_reallocator): ...this.
(line_maps::round_alloc_size): Rename to...
(line_maps::m_round_alloc_size): ...this.
(line_maps::location_adhoc_data_map): Rename to...
(line_maps::m_location_adhoc_data_map): ...this.
(line_maps::num_optimized_ranges): Rename to...
(line_maps::m_num_optimized_ranges): ..this.
(line_maps::num_unoptimized_ranges): Rename to...
(line_maps::m_num_unoptimized_ranges): ...this.
(get_combined_adhoc_loc): Delete decl.
(COMBINE_LOCATION_DATA): Delete.
* lex.cc (get_location_for_byte_range_in_cur_line): Update for
removal of COMBINE_LOCATION_DATA.
(warn_about_normalization): Likewise.
(_cpp_lex_direct): Likewise.
* line-map.cc (line_maps::~line_maps): Update for "m_" prefix of
some fields of line_maps.
(rebuild_location_adhoc_htab): Likewise.
(can_be_stored_compactly_p): Convert to...
(line_maps::can_be_stored_compactly_p): ...this private member
function.
(get_combined_adhoc_loc): Convert to...
(line_maps::get_or_create_combined_loc): ...this public member
function.
(line_maps::make_location): Update for removal of
COMBINE_LOCATION_DATA.
(get_data_from_adhoc_loc): Update for "m_" prefix of some fields
of line_maps.
(get_discriminator_from_adhoc_loc): Likewise.
(get_location_from_adhoc_loc): Likewise.
(get_range_from_adhoc_loc): Convert to...
(line_maps::get_range_from_adhoc_loc): ...this private member
function.
(line_maps::get_range_from_loc): Update for conversion of
get_range_from_adhoc_loc to a member function.
(linemap_init): Update for "m_" prefix of some fields of
line_maps.
(line_map_new_raw): Likewise.
(linemap_enter_macro): Likewise.
(linemap_get_statistics): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: "const" and other cleanups

No functional change intended.

gcc/ChangeLog:
* input.cc (make_location): Move implementation to
line_maps::make_location.

libcpp/ChangeLog:
* include/line-map.h (line_maps::pure_location_p): New decl.
(line_maps::get_pure_location): New decl.
(line_maps::get_range_from_loc): New decl.
(line_maps::get_start): New.
(line_maps::get_finish): New.
(line_maps::make_location): New decl.
(get_range_from_loc): Make line_maps param const.
(get_discriminator_from_loc): Likewise.
(pure_location_p): Likewise.
(get_pure_location): Likewise.
(linemap_check_files_exited): Likewise.
(linemap_tracks_macro_expansion_locs_p): Likewise.
(linemap_location_in_system_header_p): Likewise.
(linemap_location_from_macro_definition_p): Likewise.
(linemap_macro_map_loc_unwind_toward_spelling): Likewise.
(linemap_included_from_linemap): Likewise.
(first_map_in_common): Likewise.
(linemap_compare_locations): Likewise.
(linemap_location_before_p): Likewise.
(linemap_resolve_location): Likewise.
(linemap_unwind_toward_expansion): Likewise.
(linemap_unwind_to_first_non_reserved_loc): Likewise.
(linemap_expand_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(linemap_get_statistics): Likewise.
(linemap_dump_location): Likewise.
(linemap_dump): Likewise.
(line_table_dump): Likewise.
* internal.h (linemap_get_expansion_line): Likewise.
(linemap_get_expansion_filename): Likewise.
* line-map.cc (can_be_stored_compactly_p): Likewise.
(get_data_from_adhoc_loc): Drop redundant "class".
(get_discriminator_from_adhoc_loc): Likewise.
(get_location_from_adhoc_loc): Likewise.
(get_range_from_adhoc_loc): Likewise.
(get_range_from_loc): Make const and move implementation to...
(line_maps::get_range_from_loc): ...this new function.
(get_discriminator_from_loc): Make line_maps param const.
(pure_location_p): Make const and move implementation to...
(line_maps::pure_location_p): ...this new function.
(get_pure_location): Make const and move implementation to...
(line_maps::get_pure_location): ...this new function.
(linemap_included_from_linemap): Make line_maps param const.
(linemap_check_files_exited): Likewise.
(linemap_tracks_macro_expansion_locs_p): Likewise.
(linemap_macro_map_loc_unwind_toward_spelling): Likewise.
(linemap_get_expansion_line): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_location_in_system_header_p): Likewise.
(first_map_in_common_1): Likewise.
(linemap_compare_locations): Likewise.
(linemap_macro_loc_to_spelling_point): Likewise.
(linemap_macro_loc_to_def_point): Likewise.
(linemap_macro_loc_to_exp_point): Likewise.
(linemap_resolve_location): Likewise.
(linemap_location_from_macro_definition_p): Likewise.
(linemap_unwind_toward_expansion): Likewise.
(linemap_unwind_to_first_non_reserved_loc): Likewise.
(linemap_expand_location): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(linemap_get_statistics): Likewise.
(line_table_dump): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix ICE on sarif output when source file is unreadable [PR111700]

gcc/ChangeLog:
PR driver/111700
* input.cc (file_cache::add_file): Update leading comment to
clarify that it can fail.
(file_cache::lookup_or_add_file): Likewise.
(file_cache::get_source_file_content): Gracefully handle
lookup_or_add_file failing.

gcc/testsuite/ChangeLog:
PR driver/111700
* c-c++-common/diagnostic-format-sarif-file-pr111700.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_build_const_vector): Handle V2HF
and V4HFmode.
(ix86_build_signbit_mask): Ditto.
* config/i386/mmx.md (mmxintvecmode): Ditto.
(<code><mode>2): New define_expand.
(*mmx_<code><mode>): New define_insn_and_split.
(*mmx_nabs<mode>2): Ditto.
(*mmx_andnot<mode>3): New define_insn.
(<code><mode>3): Ditto.
(copysign<mode>3): New define_expand.
(xorsign<mode>3): Ditto.
(signbit<mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-absneghf.c: New test.
* gcc.target/i386/part-vect-copysignhf.c: New test.
* gcc.target/i386/part-vect-xorsignhf.c: New test.

Support smin/smax for V2HF/V4HF

gcc/ChangeLog:

* config/i386/mmx.md (VHF_32_64): New mode iterator.
(<insn><mode>3): New define_expand, merged from ..
(<insn>v4hf3): .. this and
(<insn>v2hf3): .. this.
(movd_v2hf_to_sse_reg): New define_expand, splitted from ..
(movd_v2hf_to_sse): .. this.
(<code><mode>3): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-vminmaxph-1.c: New test.
* gcc.target/i386/avx512fp16-64-32-vecop-1.c: Scan-assembler
only for { target { ! ia32 } }.

Fortran/OpenMP: Fix handling of strictly structured blocks

For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.

gcc/fortran/ChangeLog:

* parse.cc (parse_omp_structured_block): Make the user code end
up inside of BLOCK construct for strictly structured blocks;
fix fallout for 'section' and 'teams'.
* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
for teams in target checking.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/block_17.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.

rs6000: build constant via li/lis;rldic

This patch checks if a constant is possible to be built by "li;rldic".
Only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via li/lis;rldicl/rldicr

If a constant is possible left/right cleaned on a rotated value from
a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr"
to build the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via lis;rotldi

If a constant is possible to be rotated to/from a negative value from
"lis", then using "lis;rotldi" to build the constant.

The positive value of "lis" does not need to be analyzed. Because if a
constant can be rotated from the positive value of "lis", it also can be
rotated from a positive value of "li".

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via li;rotldi

If a constant is possible to be rotated to/from a positive or negative
value which "li" can generated, then "li;rotldi" can be used to build
the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.

[i386] Fix apx test fails on 32bit target

Since -mapxf works similar as -muintr that will emit error for 32bit
target, add !ia32 target guard for apx related tests.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: Compile for non-ia32.
* gcc.target/i386/apx-inline-gpr-norex2.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: Likewise.

RISC-V: add static-pie support

We only need to pass options to the linker when static-pie is passed.
There's another patch to enable static-pie in glibc. And we need to
enable in GCC first.

gcc/ChangeLog:

* config/riscv/linux.h: Pass the static-pie specific options to
the linker.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>

TEST: Fix XPASS of TSVC testsuites for RVV

Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s271.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s272.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s273.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s274.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s276.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s278.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s3111.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s441.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s443.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-vif.c: Ditto.

RISC-V: Enable more tests of "vect" for RVV

This patch enables almost full coverage vectorization tests for RVV, except these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1

FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1
FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1
FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector shifts"

FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_RDIV"

They are all dump FAILs (No more ICE and execution FAILs).

Fixing those FAILs will be another separate patch.

But I think we should commit this patch first.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable more vect tests for RVV.

Daily bump.

aarch64: Enable Cortex-X4 CPU

This patch adds support for the Cortex-X4 CPU to GCC.

gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add support for
cortex-x4 core.
* config/aarch64/aarch64-tune.md: Regenerated.
* doc/invoke.texi: Add command-line option for cortex-x4 core.

Revert "RISC-V: Add more run test for FP rounding autovec"

Revert since other fails are introduced

This reverts commit 7866984ba427dc56a12ee1b8d99feb4927b834b1.

[APX EGPR] Handle vex insns that only support GPR16 (5/5)

These vex insn may have legacy counterpart that could support EGPR,
but they do not have evex counterpart. Split out its vex part from
patterns and set the vex part to non-EGPR supported by adjusting
constraints and attr_gpr32.

insn list:
1. vmovmskpd/vmovmskps
2. vpmovmskb
3. vrsqrtss/vrsqrtps
4. vrcpss/vrcpps
5. vhaddpd/vhaddps, vhsubpd/vhsubps
6. vldmxcsr/vstmxcsr
7. vaddsubpd/vaddsubps
8. vlddqu
9. vtestps/vtestpd
10. vmaskmovps/vmaskmovpd, vpmaskmovd/vpmaskmovq
11. vperm2f128/vperm2i128
12. vinserti128/vinsertf128
13. vbroadcasti128/vbroadcastf128
14. vcmppd/vcmpps, vcmpss/vcmpsd
15. vgatherdps/vgatherqps, vgatherdpd/vgatherqpd

gcc/ChangeLog:

* config/i386/constraints.md (jb): New constraint for vsib memory
that does not allow gpr32.
* config/i386/i386.md: (setcc_<mode>_sse): Replace m to jm for avx
alternative and set attr_gpr32 to 0.
(movmsk_df): Split avx/noavx alternatives and replace "r" to "jr" for
avx alternative.
(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
"m/Bm" to "jm/ja" for avx alternative, set its gpr32 attr to 0.
(*rsqrtsf2_sse): Likewise.
* config/i386/mmx.md (mmx_pmovmskb): Split alternative 1 to
avx/noavx and assign jr/r constraint to dest.
* config/i386/sse.md (<sse>_movmsk<ssemodesuffix><avxsizesuffix>):
Split avx/noavx alternatives and replace "r" to "jr" for avx alternative.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): Likewise.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt): Likewise.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): Likewise.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift): Likewise.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): Likewise.
(<sse2_avx2>_pmovmskb): Likewise.
(*<sse2_avx2>_pmovmskb_zext): Likewise.
(*sse2_pmovmskb_ext): Likewise.
(*<sse2_avx2>_pmovmskb_lt): Likewise.
(*<sse2_avx2>_pmovmskb_zext_lt): Likewise.
(*sse2_pmovmskb_ext_lt): Likewise.
(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
"m/Bm" to "jm/ja" for avx alternative, set its attr_gpr32 to 0.
(sse_vmrcpv4sf2): Likewise.
(*sse_vmrcpv4sf2): Likewise.
(rsqrt<mode>2): Likewise.
(sse_vmrsqrtv4sf2): Likewise.
(*sse_vmrsqrtv4sf2): Likewise.
(avx_h<insn>v4df3): Likewise.
(sse3_hsubv2df3): Likewise.
(avx_h<insn>v8sf3): Likewise.
(sse3_h<insn>v4sf3): Likewise.
(<sse3>_lddqu<avxsizesuffix>): Likewise.
(avx_cmp<mode>3): Likewise.
(avx_vmcmp<mode>3): Likewise.
(*sse2_gt<mode>3): Likewise.
(sse_ldmxcsr): Likewise.
(sse_stmxcsr): Likewise.
(avx_vtest<ssemodesuffix><avxsizesuffix>): Replace m to jm for
avx alternative and set attr_gpr32 to 0.
(avx2_permv2ti): Likewise.
(*avx_vperm2f128<mode>_full): Likewise.
(*avx_vperm2f128<mode>_nozero): Likewise.
(vec_set_lo_v32qi): Likewise.
(<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>): Likewise.
(<avx_avx2>_maskstore<ssemodesuffix><avxsi)zesuffix>: Likewise.
(avx_cmp<mode>3): Likewise.
(avx_vmcmp<mode>3): Likewise.
(*<sse>_maskcmp<mode>3_comm): Likewise.
(*avx2_gathersi<VEC_GATHER_MODE:mode>): Replace Tv to jb and set
attr_gpr32 to 0.
(*avx2_gathersi<VEC_GATHER_MODE:mode>_2): Likewise.
(*avx2_gatherdi<VEC_GATHER_MODE:mode>): Likewise.
(*avx2_gatherdi<VEC_GATHER_MODE:mode>_2): Likewise.
(*avx2_gatherdi<VI4F_256:mode>_3): Likewise.
(*avx2_gatherdi<VI4F_256:mode>_4): Likewise.
(avx_vbroadcastf128_<mode>): Restrict non-egpr alternative to
noavx512vl, set its constraint to jm and set attr_gpr32 to 0.
(vec_set_lo_<mode><mask_name>): Likewise.
(vec_set_lo_<mode><mask_name>): Likewise for SF/SI modes.
(vec_set_hi_<mode><mask_name>): Likewise.
(vec_set_hi_<mode><mask_name>): Likewise for SF/SI modes.
(vec_set_hi_<mode>): Likewise.
(vec_set_lo_<mode>): Likewise.
(avx2_set_hi_v32qi): Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX_EGPR] Handle legacy insns that only support GPR16 (4/5)

The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns
with evex counterpart, we assume auto promotion to EGPR under APX_F if the
insn uses GPR32. So for below insns, we disabled EGPR usage for their sse
mnenomics, while allowing egpr generation of their v prefixed mnemonics.

insn list:
1. pabsb/pabsw/pabsd
2. pextrb/pextrw/pextrd/pextrq
3. pinsrb/pinsrd/pinsrq
4. pshufb
5. extractps/insertps
6. pmaddubsw
7. pmulhrsw
8. packusdw
9. palignr
10. movntdqa
11. mpsadbw
12. pmuldq/pmulld
13. pmaxsb/pmaxsd, pminsb/pminsd
    pmaxud/pmaxuw, pminud/pminuw
14. (pmovsxbw/pmovsxbd/pmovsxbq,
     pmovsxwd/pmovsxwq, pmovsxdq
     pmovzxbw/pmovzxbd/pmovzxbq,
     pmovzxwd/pmovzxwq, pmovzxdq)
15. aesdec/aesdeclast, aesenc/aesenclast
16. pclmulqdq
17. gf2p8affineqb/gf2p8affineinvqb/gf2p8mulb

gcc/ChangeLog:

* config/i386/i386.md (*movhi_internal): Split out non-gpr
supported pextrw with mem constraint to avx/noavx alternatives,
set jm and attr gpr32 0 to the noavx alternative.
(*mov<mode>_internal): Likewise.
* config/i386/mmx.md (mmx_pshufbv8qi3): Change "r/m/Bm" to
"jr/jm/ja" and set_attr gpr32 0 for noavx alternative.
(mmx_pshufbv4qi3): Likewise.
(*mmx_pinsrd): Likewise.
(*mmx_pinsrb): Likewise.
(*pinsrb): Likewise.
(mmx_pshufbv8qi3): Likewise.
(mmx_pshufbv4qi3): Likewise.
(@sse4_1_insertps_<mode>): Likewise.
(*mmx_pextrw): Split altrenatives and map non-EGPR
constraints, attr_gpr32 and attr_isa to noavx mnemonics.
(*movv2qi_internal): Likewise.
(*pextrw): Likewise.
(*mmx_pextrb): Likewise.
(*mmx_pextrb_zext): Likewise.
(*pextrb): Likewise.
(*pextrb_zext): Likewise.
(vec_extractv2si_1): Likewise.
(vec_extractv2si_1_zext): Likewise.
* config/i386/sse.md: (vi128_h_r): New mode attr for
pinsr{bw}/pextr{bw} with reg operand.
(*abs<mode>2): Split altrenatives and %v in mnemonics, map
non-EGPR constraints, gpr32 and isa attrs to noavx mnemonics.
(*vec_extract<mode>): Likewise.
(*vec_extract<mode>): Likewise for HFBF pattern.
(*vec_extract<PEXTR_MODE12:mode>_zext): Likewise.
(*vec_extractv4si_1): Likewise.
(*vec_extractv4si_zext): Likewise.
(*vec_extractv2di_1): Likewise.
(*vec_concatv2si_sse4_1): Likewise.
(<sse2p4_1>_pinsr<ssemodesuffix>): Likewise.
(vec_concatv2di): Likewise.
(*sse4_1_<code>v2qiv2di2<mask_name>_1): Likewise.
(ssse3_avx2>_pshufb<mode>3<mask_name>): Change "r/m/Bm" to
"jr/jm/ja" and set_attr gpr32 0 for noavx alternative, split
%v for avx/noavx alternatives if necessary.
(*vec_concatv2sf_sse4_1): Likewise.
(*sse4_1_extractps): Likewise.
(vec_set<mode>_0): Likewise for VI4F_128.
(*vec_setv4sf_sse4_1): Likewise.
(@sse4_1_insertps<mode>): Likewise.
(ssse3_pmaddubsw128): Likewise.
(*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>): Likewise.
(<sse4_1_avx2>_packusdw<mask_name>): Likewise.
(<ssse3_avx2>_palignr<mode>): Likewise.
(<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
(<sse4_1_avx2>_mpsadbw): Likewise.
(*sse4_1_mulv2siv2di3<mask_name>): Likewise.
(*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
(*sse4_1_<code><mode>3<mask_name>): Likewise.
(*<code>v8hi3): Likewise.
(*<code>v16qi3): Likewise.
(*sse4_1_<code>v8qiv8hi2<mask_name>_1): Likewise.
(*sse4_1_zero_extendv8qiv8hi2_3): Likewise.
(*sse4_1_zero_extendv8qiv8hi2_4): Likewise.
(*sse4_1_<code>v4qiv4si2<mask_name>_1): Likewise.
(*sse4_1_<code>v4hiv4si2<mask_name>_1): Likewise.
(*sse4_1_zero_extendv4hiv4si2_3): Likewise.
(*sse4_1_zero_extendv4hiv4si2_4): Likewise.
(*sse4_1_<code>v2hiv2di2<mask_name>_1): Likewise.
(*sse4_1_<code>v2siv2di2<mask_name>_1): Likewise.
(*sse4_1_zero_extendv2siv2di2_3): Likewise.
(*sse4_1_zero_extendv2siv2di2_4): Likewise.
(aesdec): Likewise.
(aesdeclast): Likewise.
(aesenc): Likewise.
(aesenclast): Likewise.
(pclmulqdq): Likewise.
(vgf2p8affineinvqb_<mode><mask_name>): Likewise.
(vgf2p8affineqb_<mode><mask_name>): Likewise.
(vgf2p8mulb_<mode><mask_name>): Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Handle legacy insns that only support GPR16 (3/5)

Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.

insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist

gcc/ChangeLog:

* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
prototype.
* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
function.
* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
and constraint jm to all non-evex alternatives, adjust
alternative outputs if evex reg is mentioned.
* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
and constraint jm/ja to all non-evex alternatives.
(ptesttf2): Likewise.
(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
(sse4_1_round<ssescalarmodesuffix>): Likewise.
(sse4_2_pcmpestri): Likewise.
(sse4_2_pcmpestrm): Likewise.
(sse4_2_pcmpestr_cconly): Likewise.
(sse4_2_pcmpistr): Likewise.
(sse4_2_pcmpistri): Likewise.
(sse4_2_pcmpistrm): Likewise.
(sse4_2_pcmpistr_cconly): Likewise.
(aesimc): Likewise.
(aeskeygenassist): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
tests.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Handle legacy insns that only support GPR16 (2/5)

These legacy insns in opcode map2/3 have vex but no evex
counterpart, disable EGPR for them by adjusting alternatives and
attr_gpr32.

insn list:
1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw
2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw
3. psignb/vpsginb, psignw/vpsignw, psignd/vpsignd
4. blendps/vblendps, blendpd/vblendpd
5. blendvps/vblendvps, blendvpd/vblendvpd
6. pblendvb/vpblendvb, pblendw/vpblendw
7. mpsadbw/vmpsadbw
8. dpps/vddps, dppd/vdppd
9. pcmpeqq/vpcmpeqq, pcmpgtq/vpcmpgtq

gcc/ChangeLog:

* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3): Set
attr gpr32 0 and constraint jm/ja to all mem alternatives.
(ssse3_ph<plusminus_mnemonic>wv8hi3): Likewise.
(ssse3_ph<plusminus_mnemonic>wv4hi3): Likewise.
(avx2_ph<plusminus_mnemonic>dv8si3): Likewise.
(ssse3_ph<plusminus_mnemonic>dv4si3): Likewise.
(ssse3_ph<plusminus_mnemonic>dv2si3): Likewise.
(<ssse3_avx2>_psign<mode>3): Likewise.
(ssse3_psign<mode>3): Likewise.
(<sse4_1>_blend<ssemodesuffix><avxsizesuffix): Likewise.
(<sse4_1>_blendv<ssemodesuffix><avxsizesuffix): Likewise.
(*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt): Likewise.
(*<sse4_1>_blendv<ssefltmodesuff)ix><avxsizesuffix>_not_ltint: Likewise.
(<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
(<sse4_1_avx2>_mpsadbw): Likewise.
(<sse4_1_avx2>_pblendvb): Likewise.
(*<sse4_1_avx2>_pblendvb_lt): Likewise.
(sse4_1_pblend<ssemodesuffix>): Likewise.
(*avx2_pblend<ssemodesuffix>): Likewise.
(avx2_permv2ti): Likewise.
(*avx_vperm2f128<mode>_nozero): Likewise.
(*avx2_eq<mode>3): Likewise.
(*sse4_1_eqv2di3): Likewise.
(sse4_2_gtv2di3): Likewise.
(avx2_gt<mode>3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add
sse/vex intrinsic tests.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Handle legacy insn that only support GPR16 (1/5)

These legacy insn in opcode map0/1 only support GPR16,
and do not have vex/evex counterpart, directly adjust constraints and
add gpr32 attr to patterns.

insn list:
1. xsave/xsave64, xrstor/xrstor64
2. xsaves/xsaves64, xrstors/xrstors64
3. xsavec/xsavec64
4. xsaveopt/xsaveopt64
5. fxsave64/fxrstor64

gcc/ChangeLog:

* config/i386/i386.md (<xsave>): Set attr gpr32 0 and constraint
jm.
(<xsave>_rex64): Likewise.
(<xrstor>_rex64): Likewise.
(<xrstor>64): Likewise.
(fxsave64): Likewise.
(fxstore64): Likewise.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add apxf check.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Handle GPR16 only vector move insns

For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for we select
vmovaps/vmovups for vector load/store insns that contains EGPR if
ther is no AVX512VL, and keep the original move insn selection
otherwise.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
adjust mnemonic for vmovduq/vmovdqa.
* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
avx_noavx512f.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default via mapping the common reg/mem constraint to non-EGPR
constraints.

The full list of mapping goes like

  "g" -> "jrjmi"
  "r" -> "jr"
  "m" -> "jm"
  "<" -> "j<"
  ">" -> "j>"
  "o" -> "jo"
  "V" -> "jV"
  "p" -> "jp"
  "Bm" -> "ja

For memory constraints, we add an option -mapx-inline-asm-use-gpr32
to allow/disallow gpr32 usage in any memory related constraints, as
base_reg_class/index_reg_class cannot aware whether the asm insn
support gpr32 or not.

gcc/ChangeLog:

* config/i386/i386.cc (map_egpr_constraints): New funciton to
map common constraints to EGPR prohibited constraints.
(ix86_md_asm_adjust): Calls map_egpr_constraints.
* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-inline-gpr-norex2.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Add backend hook for base_reg_class/index_reg_class.

Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
  1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
  2. Disable EGPR for unrecognized insn.
  3. If which_alternative is not decided, loop through enabled alternatives
  and check its attr_gpr32. Only enable EGPR when all enabled
  alternatives has attr_gpr32 = 1.
  4. If which_alternative is decided, enable/disable EGPR by its corresponding
  attr_gpr32.

gcc/ChangeLog:

* config/i386/i386-protos.h (ix86_insn_base_reg_class): New
prototype.
(ix86_regno_ok_for_insn_base_p): Likewise.
(ix86_insn_index_reg_class): Likewise.
* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
New helper function to scan the insn.
(ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS.
(ix86_regno_ok_for_insn_base_p): Likewise for base regno.
(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
* config/i386/i386.h (INSN_BASE_REG_CLASS): Define.
(REGNO_OK_FOR_INSN_BASE_P): Likewise.
(INSN_INDEX_REG_CLASS): Likewise.
(enum reg_class): Add INDEX_GPR16.
(GENERAL_GPR16_REGNO_P): Define.
* config/i386/i386.md (gpr32): New attribute.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Add register and memory constraints that disallow EGPR

For APX, as we extended the GENERAL_REG_CLASS, new constraints are
needed to restrict insns that cannot adopt EGPR either in its reg or
memory operands. We added a series of constraints for general/backend
ones that related to GPR usage. All of them are prefixed with "j" to
indicate the constraints does not allow EGPR.

gcc/ChangeLog:

* config/i386/constraints.md (jr): New register constraint
that prohibits EGPR.
(jR): Constraint that force usage of EGPR.
(jm): New memory constraint that prohibits EGPR.
(ja): Likewise for Bm constraint.
(jb): Likewise for Tv constraint.
(j<): New auto-dec memory constraint that prohibits EGPR.
(j>): Likewise for ">" constraint.
(jo): Likewise for "o" constraint.
(jv): Likewise for "V" constraint.
(jp): Likewise for "p" constraint.
* config/i386/i386.h (enum reg_class): Add new reg class
GENERAL_GPR16.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] Add 16 new integer general purpose registers

Extend GENERAL_REGS with extra r16-r31 registers like REX registers,
named as REX2 registers. They will only be enabled under
TARGET_APX_EGPR.

gcc/ChangeLog:

* config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p):
New function prototype.
* config/i386/i386.cc (regclass_map): Add mapping for 16 new
general registers.
(debugger64_register_map): Likewise.
(ix86_conditional_register_usage): Clear REX2 register when APX
disabled.
(ix86_code_end): Add handling for REX2 reg.
(print_reg): Likewise.
(ix86_output_jmp_thunk_or_indirect): Likewise.
(ix86_output_indirect_branch_via_reg): Likewise.
(ix86_attr_length_vex_default): Likewise.
(ix86_emit_save_regs): Adjust to allow saving r31.
(ix86_register_priority): Set REX2 reg priority same as REX.
(x86_extended_reg_mentioned_p): Add check for REX2 regs.
(x86_extended_rex2reg_mentioned_p): New function.
* config/i386/i386.h (CALL_USED_REGISTERS): Add new extended
registers.
(REG_ALLOC_ORDER): Likewise.
(FIRST_REX2_INT_REG): Define.
(LAST_REX2_INT_REG): Ditto.
(GENERAL_REGS): Add 16 new registers.
(INT_SSE_REGS): Likewise.
(FLOAT_INT_REGS): Likewise.
(FLOAT_INT_SSE_REGS): Likewise.
(INT_MASK_REGS): Likewise.
(ALL_REGS):Likewise.
(REX2_INT_REG_P): Define.
(REX2_INT_REGNO_P): Ditto.
(GENERAL_REGNO_P): Add REX2_INT_REGNO_P.
(REGNO_OK_FOR_INDEX_P): Ditto.
(REG_OK_FOR_INDEX_NONSTRICT_P): Add new extended registers.
* config/i386/i386.md: Add 16 new integer general
registers.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: New test.
* gcc.target/i386/apx-spill_to_egprs-1.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX_EGPR] Initial support for APX_F

Add -mapx-features= enumeration to separate subfeatures of APX_F.
-mapxf is treated same as previous ISA flag, while it sets
-mapx-features=apx_all that enables all subfeatures.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
(XCR_APX_F_ENABLED_MASK): Likewise.
(get_available_features): Detect APX_F under
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
(OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
(ix86_handle_option): Handle -mapxf.
* common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
* common/config/i386/i386-isas.h: Add entry for APX_F.
* config/i386/cpuid.h (bit_APX_F): New.
* config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
* config/i386/i386-opts.h (enum apx_features): New enum.
* config/i386/i386-isa.def (APX_F): New DEF_PTA.
* config/i386/i386-options.cc (ix86_function_specific_save):
Save ix86_apx_features.
(ix86_function_specific_restore): Restore it.
(ix86_valid_target_attribute_inner_p): Add mapxf.
(ix86_option_override_internal): Set ix86_apx_features for PTA
and TARGET_APX_F. Also reports error when APX_F is set but not
having TARGET_64BIT.
* config/i386/i386.opt: (-mapxf): New ISA flag option.
(-mapx=): New enumeration option.
(apx_features): New enum type.
(apx_none): New enum value.
(apx_egpr): Likewise.
(apx_push2pop2): Likewise.
(apx_ndd): Likewise.
(apx_all): Likewise.
* doc/invoke.texi: Document mapxf.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-1.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] middle-end: Add index_reg_class with insn argument.

Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

* addresses.h (index_reg_class): New wrapper function like
base_reg_class.
* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (index_part_to_reg): Pass index_class.
(process_address_1): Calls index_reg_class with curr_insn and
replace INDEX_REG_CLASS with its return value index_cl.
* reload.cc (find_reloads_address): Likewise.
(find_reloads_address_1): Likewise.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

[APX EGPR] middle-end: Add insn argument to base_reg_class

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add new macros with insn parameters to base_reg_class
for lra/reload usage.

gcc/ChangeLog:

* addresses.h (base_reg_class): Add insn argument and new macro
INSN_BASE_REG_CLASS.
(regno_ok_for_base_p_1): Add insn argument and new macro
REGNO_OK_FOR_INSN_BASE_P.
(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
* doc/tm.texi: Document INSN_BASE_REG_CLASS and
REGNO_OK_FOR_INSN_BASE_P.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (process_address_1): Pass insn to
base_reg_class.
(curr_insn_transform): Ditto.
* reload.cc (find_reloads): Ditto.
(find_reloads_address): Ditto.
(find_reloads_address_1): Ditto.
(find_reloads_subreg_address): Ditto.
* reload1.cc (maybe_fix_stack_asms): Ditto.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

RISC-V: Add more run test for FP rounding autovec

For _Float16 types, add run test for:
* ceil
* floor
* nearbyint
* rint
* round
* roundeven
* trunc

For float and double, add run test for:
* roundeven

The zfa extension is required for these run test cases, the simulation
target_board may look like below for rv64.

target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add zfa for building.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

rs6000: use mtvsrws to move sf from si p9

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the bitcast from SI to SF (or lowpart DI to SF).

For example:
*(long long*)buff = di;
float f = *(float*)(buff);

"sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".

PR target/108338

gcc/ChangeLog:

* config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
for P9.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.

rs6000: optimize moving to sf from highpart di

Currently, we have the pattern "movsf_from_si2" which was trying
to support moving high part DI to SF.

But current pattern only accepts "ashiftrt":
XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should
also be ok.
And current pattern only supports BE.

Here, updating the pattern to support BE and "lshiftrt".

PR target/108338

gcc/ChangeLog:

* config/rs6000/predicates.md (lowpart_subreg_operator): New
define_predicate.
* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si2): Rename to ...
(movsf_from_si2_<code>): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.

RISC-V: Bugfix for legitimize address PR/111634

Given we have RTL as below.

(plus:DI (mult:DI (reg:DI 138 [ g.4_6 ])
                  (const_int 8 [0x8]))
         (lo_sum:DI (reg:DI 167)
                    (symbol_ref:DI ("f") [flags 0x86] <var_decl 0x7fa96ea1cc60 f>)
))

When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case,
the fp will be the lo_sum operand as above. We have assumption that the fp
is reg but actually not here. It will have ICE when building with option
--enable-checking=rtl.

This patch would like to fix it by adding the REG_P to ensure the operand
is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this
fix when build with --enable-checking=rtl.

PR target/111634

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_address): Ensure
object is a REG before extracting its' REGNO.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix scan-assembler-times of RVV test case

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler times.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.

Daily bump.

i386: Implement doubleword shift left by 1 bit using add+adc.

This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword left shifts by 1 bit, using an add followed by an add-with-carry
(i.e. a doubleword x+x) instead of using the x86's shld instruction.
The replacement sequence both requires fewer bytes and is faster on
both Intel and AMD architectures (from Agner Fog's latency tables and
confirmed by my own micro-benchmarking).

For the test case:
__int128 foo(__int128 x) { return x << 1; }

with -O2 we previously generated:

foo: movq    %rdi, %rax
        movq    %rsi, %rdx
        shldq   $1, %rdi, %rdx
        addq    %rdi, %rax
        ret

with this patch we now generate:

foo: movq    %rdi, %rax
        movq    %rsi, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
one into add3_cc_overflow_1 followed by add3_carry.
* config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from
"*add<mode>3_cc_overflow_1" to provide generator function.

gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-2.c: New 32-bit test case.
* gcc.target/i386/ashlti3-3.c: New 64-bit test case.

Makefile.tpl: disable -Werror for feedback stage [PR111663]

Without the change profiled bootstrap fails for various warnings on
master branch as:

    $ ../gcc/configure
    $ make profiledbootstrap
    ...
    gcc/genmodes.cc: In function ‘int main(int, char**)’:
    gcc/genmodes.cc:2152:1: error: ‘gcc/build/genmodes.gcda’ profile count data file not found [-Werror=missing-profile]
    ...
    gcc/gengtype-parse.cc: In function ‘void parse_error(const char*, ...)’:
    gcc/gengtype-parse.cc:142:21: error: ‘%s’ directive argument is null [-Werror=format-overflow=]

The change removes -Werror just like autofeedback does today.

/

PR bootstrap/111663
* Makefile.tpl (STAGEfeedback_CONFIGURE_FLAGS): Disable -Werror.
* Makefile.in: Regenerate.

i386: Split lea into shorter left shift by 2 or 3 bits with -Oz.

This patch avoids long lea instructions for performing x<<2 and x<<3
by splitting them into shorter sal and move (or xchg instructions).
Because this increases the number of instructions, but reduces the
total size, its suitable for -Oz (but not -Os).

The impact can be seen in the new test case:

int foo(int x) { return x<<2; }
int bar(int x) { return x<<3; }
long long fool(long long x) { return x<<2; }
long long barl(long long x) { return x<<3; }

where with -O2 we generate:

foo: lea    0x0(,%rdi,4),%eax // 7 bytes
retq
bar: lea    0x0(,%rdi,8),%eax // 7 bytes
retq
fool: lea    0x0(,%rdi,4),%rax // 8 bytes
retq
barl: lea    0x0(,%rdi,8),%rax // 8 bytes
retq

and with -Oz we now generate:

foo: xchg   %eax,%edi // 1 byte
shl    $0x2,%eax // 3 bytes
retq
bar: xchg   %eax,%edi // 1 byte
shl    $0x3,%eax // 3 bytes
retq
fool: xchg   %rax,%rdi // 2 bytes
shl    $0x2,%rax // 4 bytes
retq
barl: xchg   %rax,%rdi // 2 bytes
shl    $0x3,%rax // 4 bytes
retq

Over the entirety of the CSiBE code size benchmark this saves 1347
bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32.
Conveniently, there's already a backend function in i386.cc for
deciding whether to split an lea into its component instructions,
ix86_avoid_lea_for_addr, all that's required is an additional
clause checking for -Oz (i.e. optimize_size > 1).

2023-10-06  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used
to perform left shifts into shorter instructions with -Oz.

gcc/testsuite/ChangeLog
* gcc.target/i386/lea-2.c: New test case.

RISC-V: const: hide mvconst splitter from IRA

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ | gfortran |

   rv32imac/  ilp32/ medlow |  416 /   103 |   13 /     6 |   67 /    12 |
rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 /     6 |   24 /     4 |
   rv64imac/   lp64/ medlow |  417 /   104 |    9 /     3 |   67 /    12 |
rv64imafdc/  lp64d/ medlow |  416 /   103 |    5 /     2 |    6 /     1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law <jeffreyalaw@gmail.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

Docs: Minimally document standard C/C++ attribute syntax.

gcc/ChangeLog:

* doc/extend.texi (Function Attributes): Mention standard attribute
syntax.
(Variable Attributes): Likewise.
(Type Attributes): Likewise.
(Attribute Syntax): Likewise.

amdgcn: switch mov insns to compact syntax

The move instructions typically have many alternatives (and I'm about to add
more) so are good candidates for the new syntax.

This patch only converts the patterns where there are no significant changes to
the generated files. The other patterns can be converted another time.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (*mov<mode>): Convert to compact syntax.
(mov<mode>_exec): Likewise.
(mov<mode>_sgprbase): Likewise.
* config/gcn/gcn.md (*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.

amdgcn: silence warning

gcc/ChangeLog:

* config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning.

libgomp.texi: Document some of the device-memory routines

libgomp/ChangeLog:

* libgomp.texi (Device Memory Routines): New.

MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a & b`

Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*)
only for GIMPLE so we don't get an infinite loop for fold any more.

Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a
and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/111699

gcc/ChangeLog:

* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111699-1.c: New test.

ipa: Remove ipa_bits

The following patch removes ipa_bits struct pointer/vector from ipa
jump functions and ipa cp transformations.

The reason is because the struct uses widest_int to represent
mask/value pair, which in the RFC patches to allow larger precisions
for wide_int/widest_int is GC unfriendly because those types become
non-trivially default constructible/copyable/destructible.
One option would be to use trailing_wide_int for that instead, but
as pointed out by Aldy, irange_storage which we already use under
the hood for ipa_vr when type of parameter is integral or pointer
already stores the mask/value pair because VRP now does the bit cp
as well.
So, this patch just uses m_vr to store both the value range and
the bitmask.  There is still separate propagation of the
ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but
when storing we merge the two into the same container.

2023-10-06  Jakub Jelinek  <jakub@redhat.com>

* ipa-prop.h (ipa_bits): Remove.
(struct ipa_jump_func): Remove bits member.
(struct ipcp_transformation): Remove bits member, adjust
ctor and dtor.
(ipa_get_ipa_bits_for_value): Remove.
* ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove.
(ipa_bits_hash_table): Remove.
(ipa_print_node_jump_functions_for_edge): Don't print bits.
(ipa_get_ipa_bits_for_value): Remove.
(ipa_set_jfunc_bits): Remove.
(ipa_compute_jump_functions_for_edge): For pointers query
pointer alignment before ipa_set_jfunc_vr and update_bitmask
in there.  For integral types, just rely on bitmask already
being handled in value ranges.
(ipa_check_create_edge_args): Don't create ipa_bits_hash_table.
(ipcp_transformation_initialize): Neither here.
(ipcp_transformation_t::duplicate): Don't copy bits vector.
(ipa_write_jump_function): Don't stream bits here.
(ipa_read_jump_function): Neither here.
(useful_ipcp_transformation_info_p): Don't test bits vec.
(write_ipcp_transformation_info): Don't stream bits here.
(read_ipcp_transformation_info): Neither here.
(ipcp_get_parm_bits): Get mask and value from m_vr rather
than bits.
(ipcp_update_bits): Remove.
(ipcp_update_vr): For pointers, set_ptr_info_alignment from
bitmask stored in value range.
(ipcp_transform_function): Don't test bits vector, don't call
ipcp_update_bits.
* ipa-cp.cc (propagate_bits_across_jump_function): Don't use
jfunc->bits, instead get mask and value from jfunc->m_vr.
(ipcp_store_bits_results): Remove.
(ipcp_store_vr_results): Incorporate parts of
ipcp_store_bits_results here, merge the bitmasks with value
range if both are supplied.
(ipcp_driver): Don't call ipcp_store_bits_results.
* ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits
clearing.

RISC-V: Use stdint-gcc.h in rvv testsuite

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

Tested using glibc rv32gcv and rv64gcv on r14-4381-g7eb5ce7f58e.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h:
Replace stdint.h with stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111232.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr111255.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Update comments for FP rounding related autovec

Some comment is out of date, this patch would like to fix it.

gcc/ChangeLog:

* config/riscv/autovec.md: Update comments.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

RISC-V: Test memcpy inlined on riscv_v

Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to
perform a memcpy. We no longer expect memcpy for riscv_v targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90263.c: Skip riscv_v targets.
* gcc.target/riscv/rvv/base/pr90263.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>

Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h

2023-10-05 John David Anglin <danglin@gcc.gnu.org>

* config/pa/pa32-linux.h (MALLOC_ABI_ALIGNMENT): Delete.

libstdc++: [_GLIBCXX_INLINE_VERSION] Add missing symbols

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu-versioned-namespace.ver: Add missing symbols
for _Float{16,32,64,128,32x,64x,128x}.

Create a fast VRP pass

* timevar.def (TV_TREE_FAST_VRP): New.
* tree-pass.h (make_pass_fast_vrp): New prototype.
* tree-vrp.cc (class fvrp_folder): New.
(fvrp_folder::fvrp_folder): New.
(fvrp_folder::~fvrp_folder): New.
(fvrp_folder::value_of_expr): New.
(fvrp_folder::value_on_edge): New.
(fvrp_folder::value_of_stmt): New.
(fvrp_folder::pre_fold_bb): New.
(fvrp_folder::post_fold_bb): New.
(fvrp_folder::pre_fold_stmt): New.
(fvrp_folder::fold_stmt): New.
(execute_fast_vrp): New.
(pass_data_fast_vrp): New.
(pass_vrp:execute): Check for fast VRP pass.
(make_pass_fast_vrp): New.

Add a dom based ranger for fast VRP.

Provide a dominator based implementation of a range query.

* gimple-range.cc (dom_ranger::dom_ranger): New.
(dom_ranger::~dom_ranger): New.
(dom_ranger::range_of_expr): New.
(dom_ranger::edge_range): New.
(dom_ranger::range_on_edge): New.
(dom_ranger::range_in_bb): New.
(dom_ranger::range_of_stmt): New.
(dom_ranger::maybe_push_edge): New.
(dom_ranger::pre_bb): New.
(dom_ranger::post_bb): New.
* gimple-range.h (class dom_ranger): New.

Add outgoing range vector calcualtion API

Provide a GORI API which can produce a range vector for all outgoing
ranges on an edge without any of the other infratructure.

* gimple-range-gori.cc (gori_stmt_info::gori_stmt_info): New.
(gori_calc_operands): New.
(gori_on_edge): New.
(gori_name_helper): New.
(gori_name_on_edge): New.
* gimple-range-gori.h (gori_on_edge): New prototype.
(gori_name_on_edge): New prototype.

ipa-utils: avoid uninitialized probabilities on ICF [PR111559]

r14-3459-g0c78240fd7d519 "Check that passes do not forget to define profile"
exposed check failures in cases when gcc produces uninitialized profile
probabilities. In case of PR/111559 uninitialized profile is generated
by edges executed 0 times reported by IPA profile:

    $ gcc -O2 -fprofile-generate pr111559.c -o b -fopt-info
    $ ./b
    $ gcc -O2 -fprofile-use -fprofile-correction pr111559.c -o b -fopt-info

    pr111559.c: In function 'rule1':
    pr111559.c:6:13: error: probability of edge 3->4 not initialized
        6 | static void rule1(void) { if (p) edge(); }
          |             ^~~~~
    during GIMPLE pass: fixup_cfg
    pr111559.c:6:13: internal compiler error: verify_flow_info failed

The change conservatively ignores updates with zero execution counts and
uses initially assigned probabilities (`always` probability in case of
the example).

PR ipa/111283
PR gcov-profile/111559

gcc/
* ipa-utils.cc (ipa_merge_profiles): Avoid producing
uninitialized probabilities when merging counters with zero
denominators.

gcc/testsuite/
* gcc.dg/tree-prof/pr111559.c: New test.

secpol: consistent indentation

86% of the document have 4 spaces; adjust the remaining 14%.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
ChangeLog:

* SECURITY.txt: Fix up indentation.

secpol: add grammatically missing commas / remove one excess instance

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
ChangeLog:

* SECURITY.txt: Fix up commas.

i386: Improve memory copy from named address space [PR111657]

The stringop strategy selection algorithm falls back to a libcall strategy
when it exhausts its pool of available strategies. The memory area copy
function (memcpy) is not availabe from the system library for non-default
address spaces, so the compiler emits the most trivial byte-at-a-time
copy loop instead.

The compiler should instead emit an optimized copy loop as a fallback for
non-default address spaces.

PR target/111657

gcc/ChangeLog:

* config/i386/i386-expand.cc (alg_usable_p): Reject libcall
strategy for non-default address spaces.
(decide_alg): Use loop strategy as a fallback strategy for
non-default address spaces.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111657.c: New test.

contrib: add mdcompact

Hello all,

this patch checks in mdcompact, the tool written in elisp that I used
to mass convert all the multi choice pattern in the aarch64 back-end to
the new compact syntax.

I tested it on Emacs 29 (might run on older versions as well not
sure), also I verified it runs cleanly on a few other back-ends (arm,
loongarch).

The tool can be used to convert a single pattern, an open buffer or
all md files in a directory.

The tool might need further adjustment to run on some specific
back-end, in case very happy to help.

This patch was pre-approved here [1].

Best Regards

Andrea Corallo

[1] <https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631830.html>

contrib/ChangeLog

* mdcompact/mdcompact-testsuite.el: New file.
* mdcompact/mdcompact.el: Likewise.
* mdcompact/tests/1.md: Likewise.
* mdcompact/tests/1.md.out: Likewise.
* mdcompact/tests/2.md: Likewise.
* mdcompact/tests/2.md.out: Likewise.
* mdcompact/tests/3.md: Likewise.
* mdcompact/tests/3.md.out: Likewise.
* mdcompact/tests/4.md: Likewise.
* mdcompact/tests/4.md.out: Likewise.
* mdcompact/tests/5.md: Likewise.
* mdcompact/tests/5.md.out: Likewise.
* mdcompact/tests/6.md: Likewise.
* mdcompact/tests/6.md.out: Likewise.
* mdcompact/tests/7.md: Likewise.
* mdcompact/tests/7.md.out: Likewise.

LibF7: Remove uses of attribute pure.

libgcc/config/avr/libf7/
* libf7.h (F7_PURE): Remove all occurrences.
* libf7.c: Same.

LibF7: Use monic denominator polynomials to save a multiplication.

libgcc/config/avr/libf7/
* libf7.h (F7_FLAGNO_plusx, F7_FLAG_plusx): New macros.
* libf7.c (f7_horner): Handle F7_FLAG_plusx in highest coefficient.
* libf7-const.def [F7MOD_atan_]: Denominator: Set F7_FLAG_plusx
and omit highest term.
[F7MOD_asinacos_]: Use rational function with normalized denominator.

sreal: Fix typo in function name

My earlier version of the ipa_bits removal patch resulted in self-test
failures in sreal.  When debugging it, I was really confused that I couldn't
find verify_arithmetics function in the source.  Turns out it had bad
spelling...

2023-10-05  Jakub Jelinek  <jakub@redhat.com>

* sreal.cc (verify_aritmetics): Rename to ...
(verify_arithmetics): ... this.
(sreal_verify_arithmetics): Adjust caller.

Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"

This reverts commit 1be18ea110a2d69570dbc494588a7c73173883be.

As reported in PR bootstrap/111688, it broke ppc64le bootstrap because
of a debug-compare failure.

RISC-V: Remove @ of vec_series

gcc/ChangeLog:

* config/riscv/autovec.md (@vec_series<mode>): Remove @.
(vec_series<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_const_vector): Ditto.
(shuffle_decompress_patterns): Ditto.

arc: Update tests predicates when using linux toolchain.

gcc/testsuite:

* gcc.target/arc/enter-dw2-1.c: Remove tests when using linux
build.
* gcc.target/arc/tls-ld.c: Update test.
* gcc.target/arc/tls-le.c: Likewise.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arc: Remove obsolete ccfsm instruction predication mechanism

Remove old ccfsm responsible for conditional execution support in ARC.
This machinery is not needed as the current gcc conditional execution
support is mature.

gcc/

* config/arc/arc-passes.def: Remove arc_ifcvt pass.
* config/arc/arc-protos.h (arc_ccfsm_branch_deleted_p): Remove.
(arc_ccfsm_record_branch_deleted): Likewise.
(arc_ccfsm_cond_exec_p): Likewise.
(arc_ccfsm): Likewise.
(arc_ccfsm_record_condition): Likewise.
(make_pass_arc_ifcvt): Likewise.
* config/arc/arc.cc (arc_ccfsm): Remove.
(arc_ccfsm_current): Likewise.
(ARC_CCFSM_BRANCH_DELETED_P): Likewise.
(ARC_CCFSM_RECORD_BRANCH_DELETED): Likewise.
(ARC_CCFSM_COND_EXEC_P): Likewise.
(CCFSM_ISCOMPACT): Likewise.
(CCFSM_DBR_ISCOMPACT): Likewise.
(machine_function): Remove ccfsm related fields.
(arc_ifcvt): Remove pass.
(arc_print_operand): Remove `#` punct operand and other ccfsm
related code.
(arc_ccfsm_advance): Remove.
(arc_ccfsm_at_label): Likewise.
(arc_ccfsm_record_condition): Likewise.
(arc_ccfsm_post_advance): Likewise.
(arc_ccfsm_branch_deleted_p): Likewise.
(arc_ccfsm_record_branch_deleted): Likewise.
(arc_ccfsm_cond_exec_p): Likewise.
(arc_get_ccfsm_cond): Likewise.
(arc_final_prescan_insn): Remove ccfsm references.
(arc_internal_label): Likewise.
(arc_reorg): Likewise.
(arc_output_libcall): Likewise.
* config/arc/arc.md: Remove ccfsm references and update related
instruction patterns.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arc: Remove '^' print punct character

The '^' was used to print '@' character in the ouput assembly. This is
not anylonger required by the ARC binutils. Remove it.

gcc/

* config/arc/arc.cc (arc_init): Remove '^' punct char.
(arc_print_operand): Remove related code.
* config/arc/arc.md: Update patterns which uses '%&'.

gcc/testsuite/

* gcc.target/arc/loop-3.c: Update test.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arc: Update/remove ARC specific tests

Update tests and remove old mtune-* tests.

gcc/testsuite

* gcc.target/arc/add_n-combine.c: Recognize add2 instruction.
* gcc.target/arc/firq-4.c: FP register is a temp reg. Update test.
* gcc.target/arc/firq-6.c: Likewise.
* gcc.target/arc/mtune-ARC600.c: Remove test.
* gcc.target/arc/mtune-ARC601.c: Likewise.
* gcc.target/arc/mtune-ARC700-xmac: Likewise.
* gcc.target/arc/mtune-ARC700.c: Likewise.
* gcc.target/arc/mtune-ARC725D.c: Likewise.
* gcc.target/arc/mtune-ARC750D.c: Likewise.
* gcc.target/arc/uncached-7.c: Set it to XFAIL.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arc: Remove unused/incomplete alignment assembly annotation.

Removes '&' print operant punct character, disable -mannotate-align
option and clean up the port.

gcc/

* config/arc/arc-protos.h (arc_clear_unalign): Remove.
(arc_toggle_unalign): Likewise.
* config/arc/arc.cc (machine_function) Remove unalign.
(arc_init): Remove `&` punct character.
(arc_print_operand): Remove `&` related functions.
(arc_verify_short): Update function's number of parameters.
(output_short_suffix): Update function.
(arc_short_long): Likewise.
(arc_clear_unalign): Remove.
(arc_toggle_unalign): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Remove.
(ASM_OUTPUT_ALIGN): Update.
* config/arc/arc.md: Remove all `%&` references.
* config/arc/arc.opt (mannotate-align): Ignore option.
* doc/invoke.texi (mannotate-align): Update description.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

Fix SIMD call SLP discovery

When we do SLP discovery of SIMD calls we run into the issue that
when the call is neither builtin nor internal function we have
cfn == CFN_LAST but internal_fn_p of that returns true. Since
IFN_LAST isn't vectorizable we fail spuriously.

Fixed by checking for cfn != CFN_LAST && internal_fn_p (cfn)
instead.

* tree-vect-slp.cc (vect_build_slp_tree_1): Do not
ask for internal_fn_p (CFN_LAST).

Avoid left around copies when value-numbering BBs

The following makes sure to treat values whose definition we didn't
visit as available since those by definition must dominate the entry
of the region. That avoids unpropagated copies after if-conversion
and resulting SLP discovery fails (which doesn't handle plain copies).

* tree-ssa-sccvn.cc (rpo_elim::eliminate_avail): Not
visited value numbers are available itself.

ipa/111643 - clarify flatten attribute documentation

The following clarifies the flatten attribute documentation to mention
the inlining applies also to calls formed as part of inlining earlier
calls but not calls to the function itself.

PR ipa/111643
* doc/extend.texi (attribute flatten): Clarify.

Daily bump.

Add a GCC Security policy

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
ChangeLog:

* SECURITY.txt: New file.

libstdc++: Correctly call _string_types function

flake8 points out that the new call to _string_types from
StdExpAnyPrinter.__init__ is not correct -- it needs to be qualified.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py
(StdExpAnyPrinter.__init__): Qualify call to
_string_types.

ARC: Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

This patch splits SImode shifts, for !TARGET_BARREL_SHIFTER targets,
after combine and before reload, in the split1 pass, as suggested by
the FIXME comment above output_shift in arc.cc.  To do this I've
copied the implementation of the x86_pre_reload_split function from
the i386 backend, and renamed it arc_pre_reload_split.

Although the actual implementations of shifts remain the same
(as in output_shift), having them as explicit instructions in
the RTL stream allows better scheduling and use of compact forms
when available.  The benefits can be seen in two short examples
below.

For the function:
unsigned int foo(unsigned int x, unsigned int y) {
  return y << 2;
}

GCC with -O2 -mcpu=em would previously generate:
foo:    add r1,r1,r1
        add r1,r1,r1
        j_s.d   [blink]
        mov_s   r0,r1   ;4
and with this patch now generates:
foo:    asl_s r0,r1
        j_s.d   [blink]
        asl_s r0,r0

Notice the original (from shift_si3's output_shift) requires the
shift sequence to be monolithic with the same destination register
as the source (requiring an extra mov_s).  The new version can
eliminate this move, and schedule the second asl in the branch
delay slot of the return.

For the function:
int x,y,z;

void bar()
{
  x <<= 3;
  y <<= 3;
  z <<= 3;
}

GCC -O2 -mcpu=em currently generates:
bar: push_s  r13
        ld.as   r12,[gp,@x@sda] ;23
        ld.as   r3,[gp,@y@sda]  ;23
        mov r2,0
        add3 r12,r2,r12
        mov r2,0
        add3 r3,r2,r3
        ld.as   r2,[gp,@z@sda]  ;23
        st.as   r12,[gp,@x@sda] ;26
        mov r13,0
        add3 r2,r13,r2
        st.as   r3,[gp,@y@sda]  ;26
        st.as   r2,[gp,@z@sda]  ;26
        j_s.d   [blink]
        pop_s   r13

where each shift by 3, uses ARC's add3 instruction, which is similar
to x86's lea implementing x = (y<<3) + z, but requires the value zero
to be placed in a temporary register "z".  Splitting this before reload
allows these pseudos to be shared/reused.  With this patch, we get

bar: ld.as   r2,[gp,@x@sda]  ;23
        mov_s   r3,0    ;3
        add3    r2,r3,r2
        ld.as   r3,[gp,@y@sda]  ;23
        st.as   r2,[gp,@x@sda]  ;26
        ld.as   r2,[gp,@z@sda]  ;23
        mov_s   r12,0   ;3
        add3    r3,r12,r3
        add3    r2,r12,r2
        st.as   r3,[gp,@y@sda]  ;26
        st.as   r2,[gp,@z@sda]  ;26
        j_s     [blink]

Unfortunately, register allocation means that we only share two of the
three "mov_s z,0", but this is sufficient to reduce register pressure
enough to avoid spilling r13 in the prologue/epilogue.

2023-10-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc-protos.h (emit_shift): Delete prototype.
(arc_pre_reload_split): New function prototype.
* config/arc/arc.cc (emit_shift): Delete function.
(arc_pre_reload_split): New predicate function, copied from i386,
to schedule define_insn_and_split splitters to the split1 pass.
* config/arc/arc.md (ashlsi3): Expand RTL template unconditionally.
(ashrsi3): Likewise.
(lshrsi3): Likewise.
(shift_si3): Move after other shift patterns, and disable when
operands[2] is one (which is handled by its own define_insn).
Use shiftr4_operator, instead of shift4_operator, as this is no
longer used for left shifts.
(shift_si3_loop): Likewise.  Additionally remove match_scratch.
(*ashlsi3_nobs): New pre-reload define_insn_and_split.
(*ashrsi3_nobs): Likewise.
(*lshrsi3_nobs): Likewise.
(rotrsi3_cnt1): Rename define_insn from *rotrsi3_cnt1.
(add_shift): Rename define_insn from *add_shift.
* config/arc/predicates.md (shiftl4_operator): Delete.
(shift4_operator): Delete.

gcc/testsuite/ChangeLog
* gcc.target/arc/ashrsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/ashrsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/ashrsi-3.c: Likewise.
* gcc.target/arc/ashrsi-4.c: Likewise.
* gcc.target/arc/ashrsi-5.c: Likewise.
* gcc.target/arc/lshrsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/lshrsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/lshrsi-3.c: Likewise.
* gcc.target/arc/lshrsi-4.c: Likewise.
* gcc.target/arc/lshrsi-5.c: Likewise.
* gcc.target/arc/shlsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/shlsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/shlsi-3.c: Likewise.
* gcc.target/arc/shlsi-4.c: Likewise.
* gcc.target/arc/shlsi-5.c: Likewise.