git.ipfire.org Git - thirdparty/gcc.git/log

x86: Update -mtune=tremont

Initial -mtune=tremont update

1. Use Haswell scheduling model.
2. Assume that stack engine allows to execute push&pop instructions in
parall.
3. Prepare for scheduling pass as -mtune=generic.
4. Use the same issue rate as -mtune=generic.
5. Enable partial_reg_dependency.
6. Disable accumulate_outgoing_args
7. Enable use_leave
8. Enable push_memory
9. Disable four_jump_limit
10. Disable opt_agu
11. Disable avoid_lea_for_addr
12. Disable avoid_mem_opnd_for_cmove
13. Enable misaligned_move_string_pro_epilogues
14. Enable use_cltd
16. Enable avoid_false_dep_for_bmi
17. Enable avoid_mfence
18. Disable expand_abs
19. Enable sse_typeless_stores
20. Enable sse_load0_by_pxor
21. Disable split_mem_opnd_for_fp_converts
22. Disable slow_pshufb
23. Enable partial_reg_dependency

This is the first patch to tune for Tremont.  With all patches applied,
performance impacts on SPEC CPU 2017 are:

500.perlbench_r         1.81%
502.gcc_r               0.57%
505.mcf_r               1.16%
520.omnetpp_r           0.00%
523.xalancbmk_r         0.00%
525.x264_r              4.55%
531.deepsjeng_r         0.00%
541.leela_r             0.39%
548.exchange2_r         1.13%
557.xz_r                0.00%
geomean for intrate     0.95%
503.bwaves_r            0.00%
507.cactuBSSN_r         6.94%
508.namd_r              12.37%
510.parest_r            1.01%
511.povray_r            3.70%
519.lbm_r               36.61%
521.wrf_r               8.79%
526.blender_r           2.91%
527.cam4_r              6.23%
538.imagick_r           0.28%
544.nab_r               21.99%
549.fotonik3d_r         3.63%
554.roms_r              -1.20%
geomean for fprate      7.50%

gcc/ChangeLog

* common/config/i386/i386-common.c: Use Haswell scheduling model
for Tremont.
* config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont
scheduling pass.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont
issue rate to 4.
(ix86_adjust_cost): Handle Tremont.
* config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
Enable for Tremont.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_CLTD): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont.
(X86_TUNE_FOUR_JUMP_LIMIT): Likewise.
(X86_TUNE_OPT_AGU): Likewise.
(X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise.
(X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise.
(X86_TUNE_EXPAND_ABS): Likewise.
(X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise.
(X86_TUNE_SLOW_PSHUFB): Likewise.

Fix PR rtl-optimization/102306

This is a duplication of volatile loads introduced during GCC 9 development
by the 2->2 mechanism of the RTL combiner. There is already a substantial
checking for volatile references in can_combine_p but it implicitly assumes
that the combination reduces the number of instructions, which is of course
not the case here. So the fix teaches try_combine to abort the combination
when it is about to make a copy of volatile references to preserve them.

gcc/
PR rtl-optimization/102306
* combine.c (try_combine): Abort the combination if we are about to
duplicate volatile references.

gcc/testsuite/
* gcc.target/sparc/20210917-1.c: New test.

AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_undefined_ph):
New intrinsic.
(_mm256_undefined_ph): Likewise.
(_mm512_undefined_ph): Likewise.
(_mm_cvtsh_h): Likewise.
(_mm256_cvtsh_h): Likewise.
(_mm512_cvtsh_h): Likewise.
(_mm512_castph_ps): Likewise.
(_mm512_castph_pd): Likewise.
(_mm512_castph_si512): Likewise.
(_mm512_castph512_ph128): Likewise.
(_mm512_castph512_ph256): Likewise.
(_mm512_castph128_ph512): Likewise.
(_mm512_castph256_ph512): Likewise.
(_mm512_zextph128_ph512): Likewise.
(_mm512_zextph256_ph512): Likewise.
(_mm512_castps_ph): Likewise.
(_mm512_castpd_ph): Likewise.
(_mm512_castsi512_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_castph_ps):
New intrinsic.
(_mm256_castph_ps): Likewise.
(_mm_castph_pd): Likewise.
(_mm256_castph_pd): Likewise.
(_mm_castph_si128): Likewise.
(_mm256_castph_si256): Likewise.
(_mm_castps_ph): Likewise.
(_mm256_castps_ph): Likewise.
(_mm_castpd_ph): Likewise.
(_mm256_castpd_ph): Likewise.
(_mm_castsi128_ph): Likewise.
(_mm256_castsi256_ph): Likewise.
(_mm256_castph256_ph128): Likewise.
(_mm256_castph128_ph256): Likewise.
(_mm256_zextph128_ph256): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-typecast-1.c: New test.
* gcc.target/i386/avx512fp16-typecast-2.c: Ditto.
* gcc.target/i386/avx512fp16vl-typecast-1.c: Ditto.
* gcc.target/i386/avx512fp16vl-typecast-2.c: Ditto.

AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtss2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtss2sh-1b.c: Ditto.

AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvtsh_ss):
New intrinsic.
(_mm_mask_cvtsh_ss): Likewise.
(_mm_maskz_cvtsh_ss): Likewise.
(_mm_cvtsh_sd): Likewise.
(_mm_mask_cvtsh_sd): Likewise.
(_mm_maskz_cvtsh_sd): Likewise.
(_mm_cvt_roundsh_ss): Likewise.
(_mm_mask_cvt_roundsh_ss): Likewise.
(_mm_maskz_cvt_roundsh_ss): Likewise.
(_mm_cvt_roundsh_sd): Likewise.
(_mm_mask_cvt_roundsh_sd): Likewise.
(_mm_maskz_cvt_roundsh_sd): Likewise.
(_mm_cvtss_sh): Likewise.
(_mm_mask_cvtss_sh): Likewise.
(_mm_maskz_cvtss_sh): Likewise.
(_mm_cvtsd_sh): Likewise.
(_mm_mask_cvtsd_sh): Likewise.
(_mm_maskz_cvtsd_sh): Likewise.
(_mm_cvt_roundss_sh): Likewise.
(_mm_mask_cvt_roundss_sh): Likewise.
(_mm_maskz_cvt_roundss_sh): Likewise.
(_mm_cvt_roundsd_sh): Likewise.
(_mm_mask_cvt_roundsd_sh): Likewise.
(_mm_maskz_cvt_roundsd_sh): Likewise.
* config/i386/i386-builtin-types.def
(V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT,
V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT,
V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT): Add new builtin types.
* config/i386/i386-builtin.def: Add corrresponding new builtins.
* config/i386/i386-expand.c: Handle new builtin types.
* config/i386/sse.md (VF48_128): New mode iterator.
(avx512fp16_vcvtsh2<ssescalarmodesuffix><mask_scalar_name><round_saeonly_scalar_name>):
New.
(avx512fp16_vcvt<ssescalarmodesuffix>2sh<mask_scalar_name><round_scalar_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512): Add DF contents.
(src3f): New.
* gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2pd-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2psx-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtps2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c: Ditto.

AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvtph_pd):
New intrinsic.
(_mm512_mask_cvtph_pd): Likewise.
(_mm512_maskz_cvtph_pd): Likewise.
(_mm512_cvt_roundph_pd): Likewise.
(_mm512_mask_cvt_roundph_pd): Likewise.
(_mm512_maskz_cvt_roundph_pd): Likewise.
(_mm512_cvtxph_ps): Likewise.
(_mm512_mask_cvtxph_ps): Likewise.
(_mm512_maskz_cvtxph_ps): Likewise.
(_mm512_cvtx_roundph_ps): Likewise.
(_mm512_mask_cvtx_roundph_ps): Likewise.
(_mm512_maskz_cvtx_roundph_ps): Likewise.
(_mm512_cvtxps_ph): Likewise.
(_mm512_mask_cvtxps_ph): Likewise.
(_mm512_maskz_cvtxps_ph): Likewise.
(_mm512_cvtx_roundps_ph): Likewise.
(_mm512_mask_cvtx_roundps_ph): Likewise.
(_mm512_maskz_cvtx_roundps_ph): Likewise.
(_mm512_cvtpd_ph): Likewise.
(_mm512_mask_cvtpd_ph): Likewise.
(_mm512_maskz_cvtpd_ph): Likewise.
(_mm512_cvt_roundpd_ph): Likewise.
(_mm512_mask_cvt_roundpd_ph): Likewise.
(_mm512_maskz_cvt_roundpd_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvtph_pd):
New intrinsic.
(_mm_mask_cvtph_pd): Likewise.
(_mm_maskz_cvtph_pd): Likewise.
(_mm256_cvtph_pd): Likewise.
(_mm256_mask_cvtph_pd): Likewise.
(_mm256_maskz_cvtph_pd): Likewise.
(_mm_cvtxph_ps): Likewise.
(_mm_mask_cvtxph_ps): Likewise.
(_mm_maskz_cvtxph_ps): Likewise.
(_mm256_cvtxph_ps): Likewise.
(_mm256_mask_cvtxph_ps): Likewise.
(_mm256_maskz_cvtxph_ps): Likewise.
(_mm_cvtxps_ph): Likewise.
(_mm_mask_cvtxps_ph): Likewise.
(_mm_maskz_cvtxps_ph): Likewise.
(_mm256_cvtxps_ph): Likewise.
(_mm256_mask_cvtxps_ph): Likewise.
(_mm256_maskz_cvtxps_ph): Likewise.
(_mm_cvtpd_ph): Likewise.
(_mm_mask_cvtpd_ph): Likewise.
(_mm_maskz_cvtpd_ph): Likewise.
(_mm256_cvtpd_ph): Likewise.
(_mm256_mask_cvtpd_ph): Likewise.
(_mm256_maskz_cvtpd_ph): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-expand.c: Handle new builtin types.
* config/i386/sse.md
(VF4_128_8_256): New.
(VF48H_AVX512VL): Ditto.
(ssePHmode): Add HF vector modes.
(castmode): Add new convertable modes.
(qq2phsuff): Ditto.
(ph2pssuffix): New.
(avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>): Ditto.
(avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
(avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask_1): Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>):
Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name>): Ditto.
(*avx512fp16_float_extend_ph<mode>2_load<mask_name>): Ditto.
(avx512fp16_float_extend_phv2df2<mask_name>): Ditto.
(*avx512fp16_float_extend_phv2df2_load<mask_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add vcvttsh2si/vcvttsh2usi.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvttsh_i32):
New intrinsic.
(_mm_cvttsh_u32): Likewise.
(_mm_cvtt_roundsh_i32): Likewise.
(_mm_cvtt_roundsh_u32): Likewise.
(_mm_cvttsh_i64): Likewise.
(_mm_cvttsh_u64): Likewise.
(_mm_cvtt_roundsh_i64): Likewise.
(_mm_cvtt_roundsh_u64): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/sse.md
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>):
New.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvttsh2si-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvttph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2w-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c: Ditto.

AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvttph_epi32):
New intrinsic.
(_mm512_mask_cvttph_epi32): Likewise.
(_mm512_maskz_cvttph_epi32): Likewise.
(_mm512_cvtt_roundph_epi32): Likewise.
(_mm512_mask_cvtt_roundph_epi32): Likewise.
(_mm512_maskz_cvtt_roundph_epi32): Likewise.
(_mm512_cvttph_epu32): Likewise.
(_mm512_mask_cvttph_epu32): Likewise.
(_mm512_maskz_cvttph_epu32): Likewise.
(_mm512_cvtt_roundph_epu32): Likewise.
(_mm512_mask_cvtt_roundph_epu32): Likewise.
(_mm512_maskz_cvtt_roundph_epu32): Likewise.
(_mm512_cvttph_epi64): Likewise.
(_mm512_mask_cvttph_epi64): Likewise.
(_mm512_maskz_cvttph_epi64): Likewise.
(_mm512_cvtt_roundph_epi64): Likewise.
(_mm512_mask_cvtt_roundph_epi64): Likewise.
(_mm512_maskz_cvtt_roundph_epi64): Likewise.
(_mm512_cvttph_epu64): Likewise.
(_mm512_mask_cvttph_epu64): Likewise.
(_mm512_maskz_cvttph_epu64): Likewise.
(_mm512_cvtt_roundph_epu64): Likewise.
(_mm512_mask_cvtt_roundph_epu64): Likewise.
(_mm512_maskz_cvtt_roundph_epu64): Likewise.
(_mm512_cvttph_epi16): Likewise.
(_mm512_mask_cvttph_epi16): Likewise.
(_mm512_maskz_cvttph_epi16): Likewise.
(_mm512_cvtt_roundph_epi16): Likewise.
(_mm512_mask_cvtt_roundph_epi16): Likewise.
(_mm512_maskz_cvtt_roundph_epi16): Likewise.
(_mm512_cvttph_epu16): Likewise.
(_mm512_mask_cvttph_epu16): Likewise.
(_mm512_maskz_cvttph_epu16): Likewise.
(_mm512_cvtt_roundph_epu16): Likewise.
(_mm512_mask_cvtt_roundph_epu16): Likewise.
(_mm512_maskz_cvtt_roundph_epu16): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvttph_epi32):
New intirnsic.
(_mm_mask_cvttph_epi32): Likewise.
(_mm_maskz_cvttph_epi32): Likewise.
(_mm256_cvttph_epi32): Likewise.
(_mm256_mask_cvttph_epi32): Likewise.
(_mm256_maskz_cvttph_epi32): Likewise.
(_mm_cvttph_epu32): Likewise.
(_mm_mask_cvttph_epu32): Likewise.
(_mm_maskz_cvttph_epu32): Likewise.
(_mm256_cvttph_epu32): Likewise.
(_mm256_mask_cvttph_epu32): Likewise.
(_mm256_maskz_cvttph_epu32): Likewise.
(_mm_cvttph_epi64): Likewise.
(_mm_mask_cvttph_epi64): Likewise.
(_mm_maskz_cvttph_epi64): Likewise.
(_mm256_cvttph_epi64): Likewise.
(_mm256_mask_cvttph_epi64): Likewise.
(_mm256_maskz_cvttph_epi64): Likewise.
(_mm_cvttph_epu64): Likewise.
(_mm_mask_cvttph_epu64): Likewise.
(_mm_maskz_cvttph_epu64): Likewise.
(_mm256_cvttph_epu64): Likewise.
(_mm256_mask_cvttph_epu64): Likewise.
(_mm256_maskz_cvttph_epu64): Likewise.
(_mm_cvttph_epi16): Likewise.
(_mm_mask_cvttph_epi16): Likewise.
(_mm_maskz_cvttph_epi16): Likewise.
(_mm256_cvttph_epi16): Likewise.
(_mm256_mask_cvttph_epi16): Likewise.
(_mm256_maskz_cvttph_epi16): Likewise.
(_mm_cvttph_epu16): Likewise.
(_mm_mask_cvttph_epu16): Likewise.
(_mm_maskz_cvttph_epu16): Likewise.
(_mm256_cvttph_epu16): Likewise.
(_mm256_mask_cvttph_epu16): Likewise.
(_mm256_maskz_cvttph_epu16): Likewise.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/sse.md
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>):
New.
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>): Ditto.
(*avx512fp16_fix<fixunssuffix>_trunc<mode>2_load<mask_name>): Ditto.
(avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>): Ditto.
(avx512fp16_fix<fixunssuffix>_truncv2di2_load<mask_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512): Add int32
component.
* gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtsh2si-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c: Ditto.

AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic.
(_mm_cvtsh_u32): Likewise.
(_mm_cvt_roundsh_i32): Likewise.
(_mm_cvt_roundsh_u32): Likewise.
(_mm_cvtsh_i64): Likewise.
(_mm_cvtsh_u64): Likewise.
(_mm_cvt_roundsh_i64): Likewise.
(_mm_cvt_roundsh_u64): Likewise.
(_mm_cvti32_sh): Likewise.
(_mm_cvtu32_sh): Likewise.
(_mm_cvt_roundi32_sh): Likewise.
(_mm_cvt_roundu32_sh): Likewise.
(_mm_cvti64_sh): Likewise.
(_mm_cvtu64_sh): Likewise.
(_mm_cvt_roundi64_sh): Likewise.
(_mm_cvt_roundu64_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c (ix86_expand_round_builtin):
Handle new builtin types.
* config/i386/sse.md
(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>):
New define_insn.
(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2): Likewise.
(avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

Daily bump.

libgo: update to go1.17.1 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/350414

analyzer: Fix bootstrap with clang

gcc/analyzer/ChangeLog:
PR bootstrap/102242
* engine.cc (INCLUDE_UNIQUE_PTR): Define.

libstdc++: Regenerate the src/debug Makefiles as needed

When the build configuration changes and Makefiles are recreated, the
src/debug/Makefile and src/debug/*/Makefile files are not recreated,
because they're not managed in the usual way by automake. This can lead
to build failures or surprising inconsistencies between the main and
debug versions of the library when doing incremental builds.

This causes them to be regenerated if any of the corresponding non-debug
makefiles is newer.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* src/Makefile.am (stamp-debug): Add all Makefiles as
prerequisites.
* src/Makefile.in: Regenerate.

libstdc++: Increase timeout factor for slow pb_ds tests

Compiling these tests still times out too often when running the
testsuite with more parallel jobs than there are available cores.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/ext/pb_ds/regression/tree_map_rand.cc: Increase
timeout factor to 3.
* testsuite/ext/pb_ds/regression/tree_set_rand.cc: Likewise.

libstdc++: Update documentation that only refers to c++98 and c++11

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml: Generalize to apply to more than
just -std=c++11.
* doc/html/manual/using_macros.html: Regenerate.

libstdc++: Add noexcept to std::nullopt_t constructor

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/std/optional (nullptr_t): Make constructor noexcept.

libstdc++: Remove non-deducible parameter for std::advance overload

This was just a copy and paste error.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (advance): Remove non-deducible
template parameter.

libstdc++: Add missing 'constexpr' to std::tuple [PR102270]

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.

libstdc++: Add missing constraint to std::span deduction guide [PR102280]

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/102280
* include/std/span (span(Range&&)): Add constraint to deduction
guide.

libstdc++: Fix recipes for C++11-compiled files in src/c++98

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* src/c++98/Makefile.am: Use CXXCOMPILE not LTCXXCOMPILE.
* src/c++98/Makefile.in: Regenerate.

libstdc++: Add noexcept to std::to_string overloads that don't allocate

When the values is guaranteed to fit in the SSO buffer we know the
string won't allocate, so the function can be noexcept. For 32-bit
integers, we know they need no more than 9 bytes (or 10 with a minus
sign) and the SSO buffer is 15 bytes.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(to_string): Add noexcept if the type width is 32 bits or less.

libstdc++: Add noexcept to unique_ptr accessors

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h (__uniq_ptr_impl::_M_ptr)
(__uniq_ptr_impl::_M_deleter): Add noexcept.

libstdc++: Fix UB in atomic_ref/wait_notify.cc [PR101761]

Remove UB in atomic_ref/wait_notify test.

Signed-off-by: Thomas Rodgers <trodgers@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101761
* testsuite/29_atomics/atomic_ref/wait_notify.cc (test): Use
va and vb as arguments to wait/notify, remove unused bb local.

rs6000: Handle overloads during program parsing

Although this patch looks quite large, the changes are fairly minimal.
Most of it is duplicating the large function that does the overload
resolution using the automatically generated data structures instead of
the old hand-generated ones.  This doesn't make the patch terribly easy to
review, unfortunately.  Just be aware that generally we aren't changing
the logic and functionality of overload handling.

2021-09-16  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
(altivec_resolve_new_overloaded_builtin): New forward decl.
(rs6000_new_builtin_type_compatible): New function.
(altivec_resolve_overloaded_builtin): Call
altivec_resolve_new_overloaded_builtin.
(altivec_build_new_resolved_builtin): New function.
(altivec_resolve_new_overloaded_builtin): Likewise.
* config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported):
Likewise.
* config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from
name of rs6000_new_builtin_is_supported.

c++: constrained variable template issues [PR98486]

This fixes some issues with constrained variable templates:

  - Constraints aren't checked when explicitly specializing a variable
    template.
  - Constraints aren't attached to a static data member template at
    parse time.
  - Constraints don't get propagated when (partially) instantiating a
    static data member template, so we need to make sure to look up
    constraints using the most general template during satisfaction.

PR c++/98486

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): Always
look up constraints using the most general template.
* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-var-templ1.C: New test.
* g++.dg/cpp2a/concepts-var-templ1a.C: New test.
* g++.dg/cpp2a/concepts-var-templ1b.C: New test.

Fortran - fix handling of optional allocatable DT arguments with INTENT(OUT)

gcc/fortran/ChangeLog:

PR fortran/102287
* trans-expr.c (gfc_conv_procedure_call): Wrap deallocation of
allocatable components of optional allocatable derived type
procedure arguments with INTENT(OUT) into a presence check.

gcc/testsuite/ChangeLog:

PR fortran/102287
* gfortran.dg/intent_out_14.f90: New test.

Fix PR 67102: Add libstdc++ dependancy to libffi

The error message is obvious -funconfigured-libstdc++-v3 is used
on the g++ command line. So we just add the dependancy.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

ChangeLog:

PR bootstrap/67102
* Makefile.def: Have configure-target-libffi depend on
all-target-libstdc++-v3.
* Makefile.in: Regenerate.

[i386] Change ix86_decompose_address return type to bool.

After a recent change only a boolean value is returned.

2021-09-16 Uroš Bizjak <ubizjak@gmail.com>

gcc/
* config/i386/i386-protos.h (ix86_decompose_address):
Change return type to bool.
* config/i386/i386.c (ix86_decompose_address): Ditto.

PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

This mimics what the main Makefile.in does: compile the generator
files under build (with Makefile.in's 'build/%.o' rule for compilation).
It also adds $(RUN_GEN) to optionally run it with valgrind and
the $(build_exeext) suffix.

Before, the .o files were compiled with $(COMPILE), causing link
error with $(LINKER_FOR_BUILD) for build != host.

gcc/
PR target/102353
* config/rs6000/t-rs6000 (build/rs6000-gen-builtins.o, build/rbtree.o):
Added 'build/' to target, use build/%.o rule.
(build/rs6000-gen-builtins$(build_exeext)): Add 'build/' and
'$(build_exeext)' to target and 'build/' for the *.o files.
(rs6000-builtins.c): Update for those changes; run rs6000-gen-builtins
with $(RUN_GEN).

cgraph: Do not warn about caller count mismatches of removed functions

To verify other changes in the patch series, I have been searching for
"Invalid sum of caller counts" string in symtab dump but found that
there are false warnings about functions which have their body removed
because they are now unreachable.  Those are of course invalid and so
this patches avoids checking such cgraph_nodes.

gcc/ChangeLog:

2021-08-20  Martin Jambor  <mjambor@suse.cz>

* cgraph.c (cgraph_node::dump): Do not check caller count sums if
the body has been removed.  Remove trailing whitespace.

coroutines: Small cleanups to await_statement_walker [NFC].

There is no need to make a MODIFY_EXPR for any of the condition
vars that we synthesize.

Expansion of co_return can be carried out independently of any
co_awaits that might be contained which simplifies this.

Where we are rewriting statements to handle await expression
logic, there is no need to carry out any analysis - we just need
to detect the presence of any co_await.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (await_statement_walker): Code cleanups.

middle-end/102360 - adjust .DEFERRED_INIT expansion

This avoids using native_interpret_type when we cannot do it with
the original type of the variable, instead use an integer type
for the initialization and side-step the size limitation of
native_interpret_int.

2021-09-16 Richard Biener <rguenther@suse.de>

PR middle-end/102360
* internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
of non-memory more robust.

* g++.dg/pr102360.C: New testcase.

sparc: Add scheduling information for LEON5

The LEON5 can often dual issue instructions from the same 64-bit aligned
double word if there are no data dependencies. Add scheduling information
to avoid scheduling unpairable instructions back-to-back.

gcc/ChangeLog:

* config/sparc/sparc-opts.h (enum sparc_processor_type): Add LEON5
* config/sparc/sparc.c (struct processor_costs): Add LEON5 costs
(leon5_adjust_cost): Increase cost of store with data dependency
on ALU instruction and FPU anti-dependencies.
(sparc_option_override): Add LEON5 costs
(sparc_adjust_cost): Add LEON5 cost adjustments
* config/sparc/sparc.h: Add LEON5
* config/sparc/sparc.md: Include LEON5 scheduling information
* config/sparc/sparc.opt: Add LEON5
* doc/invoke.texi: Add LEON5
* config/sparc/leon5.md: New file.

sparc: Add NOP in stack_protect_set32 if sparc_fix_b2bst enabled

This is needed to prevent the Store -> (Non-store or load) -> Store
sequence.

gcc/ChangeLog:

* config/sparc/sparc.md (stack_protect_set32): Add NOP to prevent
sensitive sequence for B2BST errata workaround.

sparc: Prevent atomic instructions in beginning of functions for UT700

A call to the function might have a load instruction in the delay slot
and a load followed by an atomic function could cause a deadlock.

gcc/ChangeLog:

* config/sparc/sparc.c (sparc_do_work_around_errata): Do not begin
functions with atomic instruction in the UT700 errata workaround.

sparc: Skip all empty assembly statements

This version detects multiple empty assembly statements in a row and also
detects non-memory barrier empty assembly statements (__asm__("")). It
can be used instead of next_active_insn().

gcc/ChangeLog:

* config/sparc/sparc.c (next_active_non_empty_insn): New function
that returns next active non empty assembly instruction.
(sparc_do_work_around_errata): Use new function.

sparc: Treat more instructions as load or store in errata workarounds

Check the attribute of instruction to determine if it performs a store
or load operation. This more generic approach sees the last instruction
in the GOTdata_op model as a potential load and treats the memory barrier
as a potential store instruction.

gcc/ChangeLog:

* config/sparc/sparc.c (store_insn_p): Add predicate for store
attributes.
(load_insn_p): Add predicate for load attributes.
(sparc_do_work_around_errata): Use new predicates.

sparc: Print out bit names for LEON and LEON3 with -mdebug

gcc/ChangeLog:

* config/sparc/sparc.c (dump_target_flag_bits): Print bit names for
LEON and LEON3.

testsuite: Support single-precision in g++.dg/eh/arm-vfp-unwind.C

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support. This patch extends it support
single-precision, useful for targets without double-precision.

2021-09-16 Richard Earnshaw <rearnsha@arm.com>

gcc/testsuite/
* g++.dg/eh/arm-vfp-unwind.C: Support single-precision.

mips: Fix macro typo

gcc/ChangeLog:

* config/mips/netbsd.h: Fix typo in name of a macro.

Check mask type when doing cond_op related gimple simplification.

gcc/ChangeLog:

PR middle-end/102080
* match.pd: Check mask type when doing cond_op related gimple
simplification.
* tree.c (is_truth_type_for): New function.
* tree.h (is_truth_type_for): New declaration.

gcc/testsuite/ChangeLog:

PR middle-end/102080
* gcc.target/i386/pr102080.c: New test.

AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtw2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtw2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c: Ditto.

AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvtepi32_ph): New
intrinsic.
(_mm512_mask_cvtepi32_ph): Likewise.
(_mm512_maskz_cvtepi32_ph): Likewise.
(_mm512_cvt_roundepi32_ph): Likewise.
(_mm512_mask_cvt_roundepi32_ph): Likewise.
(_mm512_maskz_cvt_roundepi32_ph): Likewise.
(_mm512_cvtepu32_ph): Likewise.
(_mm512_mask_cvtepu32_ph): Likewise.
(_mm512_maskz_cvtepu32_ph): Likewise.
(_mm512_cvt_roundepu32_ph): Likewise.
(_mm512_mask_cvt_roundepu32_ph): Likewise.
(_mm512_maskz_cvt_roundepu32_ph): Likewise.
(_mm512_cvtepi64_ph): Likewise.
(_mm512_mask_cvtepi64_ph): Likewise.
(_mm512_maskz_cvtepi64_ph): Likewise.
(_mm512_cvt_roundepi64_ph): Likewise.
(_mm512_mask_cvt_roundepi64_ph): Likewise.
(_mm512_maskz_cvt_roundepi64_ph): Likewise.
(_mm512_cvtepu64_ph): Likewise.
(_mm512_mask_cvtepu64_ph): Likewise.
(_mm512_maskz_cvtepu64_ph): Likewise.
(_mm512_cvt_roundepu64_ph): Likewise.
(_mm512_mask_cvt_roundepu64_ph): Likewise.
(_mm512_maskz_cvt_roundepu64_ph): Likewise.
(_mm512_cvtepi16_ph): Likewise.
(_mm512_mask_cvtepi16_ph): Likewise.
(_mm512_maskz_cvtepi16_ph): Likewise.
(_mm512_cvt_roundepi16_ph): Likewise.
(_mm512_mask_cvt_roundepi16_ph): Likewise.
(_mm512_maskz_cvt_roundepi16_ph): Likewise.
(_mm512_cvtepu16_ph): Likewise.
(_mm512_mask_cvtepu16_ph): Likewise.
(_mm512_maskz_cvtepu16_ph): Likewise.
(_mm512_cvt_roundepu16_ph): Likewise.
(_mm512_mask_cvt_roundepu16_ph): Likewise.
(_mm512_maskz_cvt_roundepu16_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvtepi32_ph): New
intrinsic.
(_mm_mask_cvtepi32_ph): Likewise.
(_mm_maskz_cvtepi32_ph): Likewise.
(_mm256_cvtepi32_ph): Likewise.
(_mm256_mask_cvtepi32_ph): Likewise.
(_mm256_maskz_cvtepi32_ph): Likewise.
(_mm_cvtepu32_ph): Likewise.
(_mm_mask_cvtepu32_ph): Likewise.
(_mm_maskz_cvtepu32_ph): Likewise.
(_mm256_cvtepu32_ph): Likewise.
(_mm256_mask_cvtepu32_ph): Likewise.
(_mm256_maskz_cvtepu32_ph): Likewise.
(_mm_cvtepi64_ph): Likewise.
(_mm_mask_cvtepi64_ph): Likewise.
(_mm_maskz_cvtepi64_ph): Likewise.
(_mm256_cvtepi64_ph): Likewise.
(_mm256_mask_cvtepi64_ph): Likewise.
(_mm256_maskz_cvtepi64_ph): Likewise.
(_mm_cvtepu64_ph): Likewise.
(_mm_mask_cvtepu64_ph): Likewise.
(_mm_maskz_cvtepu64_ph): Likewise.
(_mm256_cvtepu64_ph): Likewise.
(_mm256_mask_cvtepu64_ph): Likewise.
(_mm256_maskz_cvtepu64_ph): Likewise.
(_mm_cvtepi16_ph): Likewise.
(_mm_mask_cvtepi16_ph): Likewise.
(_mm_maskz_cvtepi16_ph): Likewise.
(_mm256_cvtepi16_ph): Likewise.
(_mm256_mask_cvtepi16_ph): Likewise.
(_mm256_maskz_cvtepi16_ph): Likewise.
(_mm_cvtepu16_ph): Likewise.
(_mm_mask_cvtepu16_ph): Likewise.
(_mm_maskz_cvtepu16_ph): Likewise.
(_mm256_cvtepu16_ph): Likewise.
(_mm256_mask_cvtepu16_ph): Likewise.
(_mm256_maskz_cvtepu16_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/i386-modes.def: Declare V2HF and V6HF.
* config/i386/sse.md (VI2H_AVX512VL): New.
(qq2phsuff): Ditto.
(sseintvecmode): Add HF vector modes.
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>):
New.
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto.
(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto.
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask_1): Ditto.
(avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto.
(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto.
(avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto.
(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto.
(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1): Ditto.
* config/i386/subst.md (round_qq2phsuff): New subst_attr.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512): Add QI
components.
* gcc.target/i386/avx512fp16-vcvtph2dq-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2w-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c: Ditto.

AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvtph_epi32):
New intrinsic/
(_mm512_mask_cvtph_epi32): Likewise.
(_mm512_maskz_cvtph_epi32): Likewise.
(_mm512_cvt_roundph_epi32): Likewise.
(_mm512_mask_cvt_roundph_epi32): Likewise.
(_mm512_maskz_cvt_roundph_epi32): Likewise.
(_mm512_cvtph_epu32): Likewise.
(_mm512_mask_cvtph_epu32): Likewise.
(_mm512_maskz_cvtph_epu32): Likewise.
(_mm512_cvt_roundph_epu32): Likewise.
(_mm512_mask_cvt_roundph_epu32): Likewise.
(_mm512_maskz_cvt_roundph_epu32): Likewise.
(_mm512_cvtph_epi64): Likewise.
(_mm512_mask_cvtph_epi64): Likewise.
(_mm512_maskz_cvtph_epi64): Likewise.
(_mm512_cvt_roundph_epi64): Likewise.
(_mm512_mask_cvt_roundph_epi64): Likewise.
(_mm512_maskz_cvt_roundph_epi64): Likewise.
(_mm512_cvtph_epu64): Likewise.
(_mm512_mask_cvtph_epu64): Likewise.
(_mm512_maskz_cvtph_epu64): Likewise.
(_mm512_cvt_roundph_epu64): Likewise.
(_mm512_mask_cvt_roundph_epu64): Likewise.
(_mm512_maskz_cvt_roundph_epu64): Likewise.
(_mm512_cvtph_epi16): Likewise.
(_mm512_mask_cvtph_epi16): Likewise.
(_mm512_maskz_cvtph_epi16): Likewise.
(_mm512_cvt_roundph_epi16): Likewise.
(_mm512_mask_cvt_roundph_epi16): Likewise.
(_mm512_maskz_cvt_roundph_epi16): Likewise.
(_mm512_cvtph_epu16): Likewise.
(_mm512_mask_cvtph_epu16): Likewise.
(_mm512_maskz_cvtph_epu16): Likewise.
(_mm512_cvt_roundph_epu16): Likewise.
(_mm512_mask_cvt_roundph_epu16): Likewise.
(_mm512_maskz_cvt_roundph_epu16): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvtph_epi32):
New intrinsic.
(_mm_mask_cvtph_epi32): Likewise.
(_mm_maskz_cvtph_epi32): Likewise.
(_mm256_cvtph_epi32): Likewise.
(_mm256_mask_cvtph_epi32): Likewise.
(_mm256_maskz_cvtph_epi32): Likewise.
(_mm_cvtph_epu32): Likewise.
(_mm_mask_cvtph_epu32): Likewise.
(_mm_maskz_cvtph_epu32): Likewise.
(_mm256_cvtph_epu32): Likewise.
(_mm256_mask_cvtph_epu32): Likewise.
(_mm256_maskz_cvtph_epu32): Likewise.
(_mm_cvtph_epi64): Likewise.
(_mm_mask_cvtph_epi64): Likewise.
(_mm_maskz_cvtph_epi64): Likewise.
(_mm256_cvtph_epi64): Likewise.
(_mm256_mask_cvtph_epi64): Likewise.
(_mm256_maskz_cvtph_epi64): Likewise.
(_mm_cvtph_epu64): Likewise.
(_mm_mask_cvtph_epu64): Likewise.
(_mm_maskz_cvtph_epu64): Likewise.
(_mm256_cvtph_epu64): Likewise.
(_mm256_mask_cvtph_epu64): Likewise.
(_mm256_maskz_cvtph_epu64): Likewise.
(_mm_cvtph_epi16): Likewise.
(_mm_mask_cvtph_epi16): Likewise.
(_mm_maskz_cvtph_epi16): Likewise.
(_mm256_cvtph_epi16): Likewise.
(_mm256_mask_cvtph_epi16): Likewise.
(_mm256_maskz_cvtph_epi16): Likewise.
(_mm_cvtph_epu16): Likewise.
(_mm_mask_cvtph_epu16): Likewise.
(_mm_maskz_cvtph_epu16): Likewise.
(_mm256_cvtph_epu16): Likewise.
(_mm256_mask_cvtph_epu16): Likewise.
(_mm256_maskz_cvtph_epu16): Likewise.
* config/i386/i386-builtin-types.def: Add new builtin types.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/sse.md (sseintconvert): New.
(ssePHmode): Ditto.
(UNSPEC_US_FIX_NOTRUNC): Ditto.
(sseintconvertsignprefix): Ditto.
(avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vmovsh/vmovw.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vmovsh-1a.c: New test.
* gcc.target/i386/avx512fp16-vmovsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-2a.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-2b.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-3a.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-3b.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-4a.c: Ditto.
* gcc.target/i386/avx512fp16-vmovw-4b.c: Ditto.

AVX512FP16: Add vmovw/vmovsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: (_mm_cvtsi16_si128):
New intrinsic.
(_mm_cvtsi128_si16): Likewise.
(_mm_mask_load_sh): Likewise.
(_mm_maskz_load_sh): Likewise.
(_mm_mask_store_sh): Likewise.
(_mm_move_sh): Likewise.
(_mm_mask_move_sh): Likewise.
(_mm_maskz_move_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Handle new builtin types.
(ix86_expand_vector_init_one_nonzero): Adjust for FP16 target.
* config/i386/sse.md (VI2F): New mode iterator.
(vec_set<mode>_0): Use new mode iterator.
(avx512f_mov<ssescalarmodelower>_mask): Adjust for HF vector mode.
(avx512f_store<mode>_mask): Ditto.

c++: Small location tweak

As Marek suggested.

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_outermost_constant_expr): Use
protected_set_expr_location.

rs6000: Remove useless toc-fusion option

toc-fusion was intended for Power9 toc fusion previously,
but Power9 doesn't support fusion at all eventually, this
patch is to remove this useless option.

gcc/ChangeLog:

* config/rs6000/rs6000.opt (-mtoc-fusion): Remove.

Daily bump.

c++: shortcut bad convs during overload resolution, part 2 [PR101904]

The r12-3346 change makes us avoid computing excess argument conversions
during overload resolution, but only when it turns out there's a
strictly viable candidate in the overload set.  If there's no such
candidate then we still need to compute more conversions than strictly
necessary because subsequent conversions after the first bad conversion
can turn a non-strictly viable candidate into an unviable one, and that
affects the outcome of overload resolution and the behavior of its
callers (because of -fpermissive).

But at least in a SFINAE context, the distinction between a non-strictly
viable and an unviable candidate shouldn't matter all that much since
performing a bad conversion is always an error (even with -fpermissive),
and so forming a call to a non-strictly viable candidate will end up
being a SFINAE error anyway, just like in the unviable case.  Hence a
non-strictly viable candidate is effectively unviable (in a SFINAE
context), and we don't really need to distinguish between the two kinds.
We can take advantage of this observation to avoid computing excess
argument conversions even when there's no strictly viable candidate in
the overload set.

This patch implements this idea.  We usually detect a SFINAE context by
looking for the absence of the tf_error flag, but that's not specific
enough: we can also get here from build_user_type_conversion with
tf_error cleared, and there the distinction between a non-strictly
viable candidate and an unviable candidate still matters (it determines
whether a user-defined conversion is bad or just doesn't exist).  So this
patch sets and checks for the tf_conv flag to detect this situation too,
which avoids regressing conv2.C below.

Unlike the previous change, this one does affect the outcome of overload
resolution, but it should do so only in a way that preserves backwards
compatibility with -fpermissive.

PR c++/101904

gcc/cp/ChangeLog:

* call.c (build_user_type_conversion_1): Add tf_conv to complain.
(add_candidates): When in a SFINAE context, instead of adding a
candidate to bad_fns just mark it unviable.

gcc/testsuite/ChangeLog:

* g++.dg/ext/conv2.C: New test.
* g++.dg/template/conv17.C: Extend test.

rs6000: fix xcoff section encoding

The encoding needs to be applied if the decl is not an alias: both a NULL
summary *OR* the decl alias flag is false. This patch updates the
earlier fix to continue with the encoding selection if the summary is
NULL.

gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
Proceed if no symbol summary or the symbol alias flag is false.

c++: add parsing_function_declarator predicate

While looking at PR96184 I noticed that we were recognizing the situation of
parsing a function declarator based on current_binding_level, and that we
ought to make that a predicate function. This patch is just refactoring,
but I just suggested using it in a review of another patch.

gcc/cp/ChangeLog:

* cp-tree.h (parsing_function_declarator): Declare.
* name-lookup.c (set_decl_context_in_fn): Use it.
* parser.c (cp_parser_direct_declarator): Use it.
(parsing_function_declarator): New.

c++: Fix handling of decls with flexible array members initialized with side-effects [PR88578]

> > Note, if the flexible array member is initialized only with non-constant
> > initializers, we have a worse bug that this patch doesn't solve, the
> > splitting of initializers into constant and dynamic initialization removes
> > the initializer and we don't have just wrong DECL_*SIZE, but nothing is
> > emitted when emitting those vars into assembly either and so the dynamic
> > initialization clobbers other vars that may overlap the variable.
> > I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
> > flexible array member in that case.
>
> Makes sense.

So, the following patch fixes that.

The typeck2.c change makes sure we keep those CONSTRUCTORs around (although
they should be empty because all their elts had side-effects/was
non-constant if it was removed earlier), and the varasm.c change is to avoid
ICEs on those as well as ICEs on other flex array members that had some
initializers without side-effects, but not on the last array element.

The code was already asserting that the (index of the last elt in the
CONSTRUCTOR + 1) times elt size is equal to TYPE_SIZE_UNIT of the local->val
type, which is true for C flex arrays or for C++ if they don't have any
side-effects or the last elt doesn't have side-effects, this patch changes
that to assertion that the TYPE_SIZE_UNIT is greater than equal to the
offset of the end of last element in the CONSTRUCTOR and uses TYPE_SIZE_UNIT
(int_size_in_bytes) in the code later on.

2021-09-15 Jakub Jelinek <jakub@redhat.com>

PR c++/88578
PR c++/102295
gcc/
* varasm.c (output_constructor_regular_field): Instead of assertion
that array_size_for_constructor result is equal to size of
TREE_TYPE (local->val) in bytes, assert that the type size is greater
or equal to array_size_for_constructor result and use type size as
fieldsize.
gcc/cp/
* typeck2.c (split_nonconstant_init_1): Don't throw away empty
initializers of flexible array members if they have non-zero type
size.
gcc/testsuite/
* g++.dg/ext/flexary39.C: New test.
* g++.dg/ext/flexary40.C: New test.

c++: default ctor that's also a list ctor [PR102050]

In grok_special_member_properties we need to set TYPE_HAS_COPY_CTOR,
TYPE_HAS_DEFAULT_CONSTRUCTOR and TYPE_HAS_LIST_CTOR independently
from each other because a constructor can be both a default and list
constructor (as in the first testcase), or both a default and copy
constructor (as in the second testcase).

PR c++/102050

gcc/cp/ChangeLog:

* decl.c (grok_special_member_properties): Set
TYPE_HAS_COPY_CTOR, TYPE_HAS_DEFAULT_CONSTRUCTOR
and TYPE_HAS_LIST_CTOR independently from each other.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist125.C: New test.
* g++.dg/cpp0x/initlist126.C: New test.

zero-call-used-regs attr for ada

Make the zero_call_used_regs attribute usable as a Machine_Attribute
pragma.

for gcc/ada/ChangeLog

* gcc-interface/utils.c: Include opts.h.
(handle_zero_call_used_regs_attribute): New.
(gnat_internal_attribute_table): Add zero_call_used_regs.

for gcc/testsuite/ChangeLog

* gnat.dg/zcur_attr.adb, gnat.dg/zcur_attr.ads: New.

i386: port vxworks to TARGET_CPU_P macro

PR target/102351

gcc/ChangeLog:

* config/i386/vxworks.h: Use new macro TARGET_CPU_P.

c++: don't warn about internal interference sizes

Most any compilation on ARM/AArch64 was warning because the default L1 cache
line size of 32B was smaller than the default
std::hardware_constructive_interference_size of 64B. This is mostly due to
inaccurate --param l1-cache-line-size, but it's not helpful to complain to a
user that didn't set the values.

gcc/cp/ChangeLog:

* decl.c (cxx_init_decl_processing): Only warn about odd
interference sizes if they were specified with --param.

rs6000: fix symtab_node::get == NULL issue

PR target/102349

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
Check that we have a symbol summary for a symbol.

gcc-changelog: Add FIXME note.

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Add FIXME note.

gcc-changelog: check git commit email address

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Check commit email.
* gcc-changelog/test_email.py: Add new test.
* gcc-changelog/test_patches.txt: Likewise.

target/102348 - fix powerpc-lynxos build

This fixes a similar issue for powerpc-lynxos as fixed for i686-lynxos
already.

2021-09-15 Richard Biener <rguenther@suse.de>

PR target/102348
* config/rs6000/lynx.h: Remove undef of PREFERRED_DEBUGGING_TYPE
to inherit from elfos.h

Optimize for V{8,16,32}HFmode vec_set/extract/init.

gcc/ChangeLog:

PR target/102327
* config/i386/i386-expand.c
(ix86_expand_vector_init_interleave): Use puncklwd to pack 2
HFmodes.
(ix86_expand_vector_set): Use blendw instead of pinsrw.
* config/i386/i386.c (ix86_can_change_mode_class): Adjust for
AVX512FP16 which supports 16bit vector load.
* config/i386/sse.md (avx512bw_interleave_highv32hi<mask_name>):
Rename to ..
(avx512bw_interleave_high<mode><mask_name>): .. this, and
extend to V32HFmode.
(avx2_interleave_highv16hi<mask_name>): Rename to ..
(avx2_interleave_high<mode><mask_name>): .. this, and extend
to V16HFmode.
(vec_interleave_highv8hi<mask_name>): Rename to ..
(vec_interleave_high<mode><mask_name>): .. this, and extend to V8HFmode.
(<mask_codefor>avx512bw_interleave_lowv32hi<mask_name>):
Rename to ..
(<mask_codefor>avx512bw_interleave_low<mode><mask_name>):
this, and extend to V32HFmode.
(avx2_interleave_lowv16hi<mask_name>): Rename to ..
(avx2_interleave_low<mode><mask_name>): .. this, and extend to V16HFmode.
(vec_interleave_lowv8hi<mask_name>): Rename to ..
(vec_interleave_low<mode><mask_name>): .. this, and extend to V8HFmode.
(sse4_1_pblendw): Rename to ..
(sse4_1_pblend<blendsuf>): .. this, and extend to V8HFmode.
(avx2_pblendph): New define_expand.
(<sse2p4_1>_pinsr<ssemodesuffix>): Refactor, use
sseintmodesuffix instead of ssemodesuffix.
(blendsuf): New mode attr.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102327-1.c: New test.
* gcc.target/i386/pr102327-2.c: New test.
* gcc.target/i386/avx512fp16-1c.c: Adjust testcase.

Maintain (mis-)alignment info in the first element of a group

This changes us to maintain and compute (mis-)alignment info for
the first element of a group only rather than for each DR when
doing interleaving and for the earliest, first, or first in the SLP
node (or any pair or all three of those) when SLP vectorizing.

For this to work out the easiest way I have changed the accessors
DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
the first element rather than adjusting all callers.

2021-09-13 Richard Biener <rguenther@suse.de>

* tree-vectorizer.h (dr_misalignment): Move out of line.
(dr_target_alignment): New.
(DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
(set_dr_target_alignment): New.
(SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
* tree-vect-data-refs.c (dr_misalignment): Compute and
return the group members misalignment.
(vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
(vect_analyze_data_refs_alignment): Compute alignment only
for the first element of a DR group.
(vect_slp_analyze_node_alignment): Likewise.

AVX512FP16: Adjust builtin name for FP16 builtins to match AVX512F style

For AVX512FP16 builtins, they all contain format like vaddph_v8hf,
while AVX512F builtins use addps128 which succeeded SSE/AVX style.
Adjust AVX512FP16 builtins to match such format.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: Adjust all builtin calls.
* config/i386/avx512fp16vlintrin.h: Likewise.
* config/i386/i386-builtin.def: Adjust builtin name and
enumeration to match AVX512F style.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Adjust builtin macros.
* gcc.target/i386/sse-13.c: Likewise.
* gcc.target/i386/sse-23.c: Likewise.

tree-optimization/102318 - reduction epilogue re-use

This refines the fix for PR102226 to do the mode conversion
from V2DI to VNx2DI separately from the sign-conversion, retaining
the signedness of the saved accumulator as before the original fix.

2021-09-15 Richard Biener <rguenther@suse.de>

PR tree-optimization/102318
* tree-vect-loop.c (vect_transform_cycle_phi): Revert
previous change and do the mode conversion separately from
the sign conversion.

* gcc.dg/vect/pr102318.c: New testcase.

libstdc++: Check for TLS support on mingw cross-compilers

Native mingw builds enable TLS, but crosses don't because we don't use
GCC_CHECK_TLS in the cross-compiler config.

libstdc++-v3/ChangeLog:

* crossconfig.m4: Check for TLS support on mingw.
* configure: Regenerate.

Output vextract{i,f}{32x4,64x2} for (vec_select:(reg:Vmode) idx) when byte_offset of idx % 16 == 0.

2020-09-13 Hongtao Liu <hongtao.liu@intel.com>
Peter Cordes <peter@cordes.ca>
gcc/ChangeLog:

PR target/91103
* config/i386/sse.md (extract_suf): Add V8SF/V8SI/V4DF/V4DI.
(*vec_extract<mode><ssescalarmodelower>_valign): Output
vextract{i,f}{32x4,64x2} instruction when byte_offset % 16 ==
0.

gcc/testsuite/ChangeLog:

PR target/91103
* gcc.target/i386/pr91103-1.c: Add extract tests.
* gcc.target/i386/pr91103-2.c: Ditto.

Add OpenACC 'host_data' testing to 'gfortran.dg/goacc/unexpected-end.f90'

Use underscore instead of space in 'host_data'.

Follow-up to recent commit 33fdbbe4ce6055eb858096d01720ccf94aa854ec
"Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]".

gcc/testsuite/
* gfortran.dg/goacc/unexpected-end.f90: Add OpenACC 'host_data'
testing.

Remove support for vax-openbsd

This removes the support for vax-openbsd which has been discontinued
after the OpenBSD 5.9 release and which has no supported gas or GNU ld
configuration [anymore]. In particular this target does only support
STABS debuginfo generation.

2021-09-13 Richard Biener <rguenther@suse.de>

* config.gcc: Remove vax-*-openbsd* configuration.

contrib/
* config-list.mk: Remove vax-openbsd.

Remove m68k-openbsd support

This removes m68k-openbsd as a valid configuration, according
to openbsd.org m68k-openbsd [on the mac] was discontinued after
the 5.1 release.  The configuration is also not (or no longer)
supported by gas and GNU ld so I could not figure whether it is still
a.out (I suspect it is).  But first and foremost the target only supports
STABS as a debugging format.

2021-09-13  Richard Biener  <rguenther@suse.de>

* config.gcc: Remove m68k-openbsd.

contrib/
* config-list.mk: Remove m68k-openbsd.

c++: don't predeclare std::type_info [PR48396]

We've always predeclared std::type_info, which has been wrong for a while,
but now with modules it becomes more of a practical problem, if we want to
declare it in the purview of a module. So don't predeclare it. For
building up the type_info information to write out with the vtable, we can
use void* instead of type_info*, since they already aren't the real types.

PR c++/48396

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Remove CPTI_TYPE_INFO_PTR_TYPE.
(type_info_ptr_type): Remove.
* rtti.c (init_rtti_processing): Don't predeclare std::type_info.
(typeid_ok_p): Check for null const_type_info_type_node.
(type_info_ptr_type, get_void_tinfo_ptr): New fns.
(get_tinfo_decl_dynamic, get_tinfo_ptr): Use them.
(ptr_initializer, ptm_initializer, get_pseudo_ti_init): Use them.
(get_tinfo_desc): Use const_ptr_type_node.

gcc/testsuite/ChangeLog:

* g++.dg/rtti/undeclared1.C: New test.

c++: correct object scope handling

The way cp_parser_lookup_name handles object scope (i.e. the scope on the
RHS of a . or -> expression) is a bit subtle: before the lookup it's in
parser->context->object type, and after the lookup it's in
parser->object_scope. But a couple of places that elide lookups were
failing to do the same transform.

I'm not aware of this breaking anything currently.

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_name): Move object type.
(cp_parser_pre_parsed_nested_name_specifier): Likewise.

c++: tweak C++20 destructor template-id rule

While working on a larger change to destructor lookup I noticed that this
rule talks about declarators, but we weren't limiting the error to the case
where we're parsing a declarator. I don't know if this actually broke
anything, since a CPP_TEMPLATE_ID would have to have been parsed once
before, but it's more correct this way.

gcc/cp/ChangeLog:

* parser.c (cp_parser_unqualified_id): Only complain about ~A<T> in
a declarator.

gcc: xtensa: fix PR target/102336

2021-09-14 Max Filippov <jcmvbkbc@gmail.com>
gcc/
PR target/102336
* config/xtensa/t-xtensa (TM_H): Add include/xtensa-config.h.

Daily bump.

Fortran - fix ICE during error recovery checking entry characteristics

gcc/fortran/ChangeLog:

PR fortran/102311
* resolve.c (resolve_entries): Attempt to recover cleanly after
rejecting mismatched function entries.

gcc/testsuite/ChangeLog:

PR fortran/102311
* gfortran.dg/entry_25.f90: New test.

c++tools : Add a simple handler for ModuleCompiledRequest.

This just replies with "OK".

c++tools/ChangeLog:

* resolver.cc (module_resolver::ModuleCompiledRequest):
Add a simple handler.
* resolver.h: Declare handler for ModuleCompiledRequest.

rs6000: Disable optimizing multiple xxsetaccz instructions into one xxsetaccz

Fwprop will happily optimize two xxsetaccz instructions into one xxsetaccz
by propagating the results of the first to the uses of the second.
We really don't want that to happen given the late priming/depriming of
accumulators.  I fixed this by making the xxsetaccz source operand an
unspec volatile.  I also removed the mma_xxsetaccz define_expand and
define_insn_and_split and replaced it with a simple define_insn.
The expand and splitter patterns were leftovers from the pre opaque mode
code when the xxsetaccz code was part of the movpxi pattern, and we don't
need them now.

Rather than a new test case, I was able to just modify the current test case
to add another __builtin_mma_xxsetaccz call which shows the bad code gen
with unpatched compilers.

2021-09-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
* config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_XXSETACCZ.
(unspecv): Add UNSPECV_MMA_XXSETACCZ.
(*mma_xxsetaccz): Delete.
(mma_xxsetaccz): Change to define_insn.  Remove operand 1.
Use UNSPECV_MMA_XXSETACCZ.  Update comment.
* config/rs6000/rs6000.c (rs6000_rtx_costs): Use UNSPECV_MMA_XXSETACCZ.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-6.c: Add second call to xxsetacc
built-in.  Update instruction counts.

configure: Avoid unnecessary constraints on executables for $build.

The executables for GCC's c-family compilers must be built with no-PIE
because they use PCH and the current model for this requires that the
exe is always lauched at the same address.  Since the other language
compilers share code with the c-family this constraint is also applied
to them.

However, the executables that run on $build (generators, and parsers
for md and def files) need not have any such constraint they do not
consume PCH files.

This change simplifies the configuration and Makefile content by
removing the code enforcing no-PIE on these exes.  This also fixes a
bootstrap issue with some Darwin versions and clang as the bootstrap
compiler,  where -no-PIE causes the correct relocation model to be
switched off leading to invalid user-space code.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* Makefile.in: Remove variables related to applying no-PIE
to the exes on $build.
* configure: Regenerate.
* configure.ac: Remove configuration related to applying
no-PIE to the exes on $build.

coroutines: Make proxy vars for the function arg copies.

This adds top level proxy variables for the coroutine frame
copies of the original function args. These are then available
in the debugger to refer to the frame copies. We rewrite the
function body to use the copies, since the original parms will
no longer be in scope when the coroutine is running.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (struct param_info): Add copy_var.
(build_actor_fn): Use simplified param references.
(register_param_uses): Likewise.
(rewrite_param_uses): Likewise.
(analyze_fn_parms): New function.
(coro_rewrite_function_body): Add proxies for the fn
parameters to the outer bind scope of the rewritten code.
(morph_fn_to_coro): Use simplified version of param ref.

coroutines: Expose implementation state to the debugger.

In the process of transforming a coroutine into the separate representation
as the ramp function and a state machine, we generate some variables that
are of interest to a user during debugging.  Any variable that is persistent
for the execution of the coroutine is placed into the coroutine frame.

In particular:
  The promise object.
  The function pointers for the resumer and destroyer.
  The current resume index (suspend point).
  The handle that represents this coroutine 'self handle'.
  Any handle provided for a continuation coroutine.
  Whether the coroutine frame is allocated and needs to be freed.

Visibility of some of these has already been requested by end users.

This patch ensures that such variables have names that are usable in a
debugger, but are in the reserved namespace for the implementation (they
all begin with _Coro_).  The identifiers are generated lazily when the
first coroutine is encountered.

We place the variables into the outermost bind expression and then add a
DECL_VALUE_EXPR to each that points to the frame entry.

These changes simplify the handling of the variables in the body of the
function (in particular, the use of the DECL_VALUE_EXPR means that we now
no longer need to rewrite proxies for the promise and coroutine handles into
the frame->offset form).

Partial improvement to debugging (PR c++/99215).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (coro_resume_fn_id, coro_destroy_fn_id,
coro_promise_id, coro_frame_needs_free_id, coro_resume_index_id,
coro_self_handle_id, coro_actor_continue_id,
coro_frame_i_a_r_c_id): New.
(coro_init_identifiers): Initialize new name identifiers.
(coro_promise_type_found_p): Use pre-built identifiers.
(struct await_xform_data): Remove unused fields.
(transform_await_expr): Delete code that is now unused.
(build_actor_fn): Simplify interface, use pre-built identifiers and
remove transforms that are no longer needed.
(build_destroy_fn): Use revised field names.
(register_local_var_uses): Use pre-built identifiers.
(coro_rewrite_function_body): Simplify interface, use pre-built
identifiers.  Generate proxy vars in the outer bind expr scope for the
implementation state that we wish to expose.
(morph_fn_to_coro): Adjust comments for new variable names, use pre-
built identifiers.  Remove unused code to generate frame entries for
the implementation state.  Adjust call for build_actor_fn.

c++: empty union member activation during constexpr [PR102163]

Here, the union's constructor is defined to activate its empty data
member _M_rest, but during constexpr evaluation of this constructor the
subobject constructor call O::O(&_M_rest, 42) doesn't produce a side
effect that actually activates the member, so the union still appears
uninitialized after its constructor has run. This patch fixes this by
using a dummy MODIFY_EXPR in this situation, whose evaluation ensures
the member gets activated.

PR c++/102163

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_call_expression): After evaluating a
subobject constructor call for an empty union member, produce a
side effect that makes sure the member gets activated.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty17.C: New test.

c++: Update DECL_*SIZE for objects with flexible array members with initializers [PR102295]

The C FE updates DECL_*SIZE for vars which have initializers for flexible
array members for many years, but C++ FE kept DECL_*SIZE the same as the
type size (i.e. as if there were zero elements in the flexible array
member). This results e.g. in ELF symbol sizes being too small.

Note, if the flexible array member is initialized only with non-constant
initializers, we have a worse bug that this patch doesn't solve, the
splitting of initializers into constant and dynamic initialization removes
the initializer and we don't have just wrong DECL_*SIZE, but nothing is
emitted when emitting those vars into assembly either and so the dynamic
initialization clobbers other vars that may overlap the variable.
I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
flexible array member in that case.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102295
* decl.c (layout_var_decl): For aggregates ending with a flexible
array member, add the size of the initializer for that member to
DECL_SIZE and DECL_SIZE_UNIT.

* g++.target/i386/pr102295.C: New test.

c++: Fix __is_*constructible/assignable for templates [PR102305]

is_xible_helper returns error_mark_node (i.e. false from the traits)
for abstract classes by testing ABSTRACT_CLASS_TYPE_P (to) early.
Unfortunately, as the testcase shows, that doesn't work on class templates
that haven't been instantiated yet, ABSTRACT_CLASS_TYPE_P for them is false
until it is instantiated, which is done when the routine later constructs
a dummy object with that type.

The following patch fixes this by calling complete_type first, so that
ABSTRACT_CLASS_TYPE_P test will work properly, while keeping the handling
of arrays with unknown bounds, or incomplete types where it is done
currently.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102305
* method.c (is_xible_helper): Call complete_type on to.

* g++.dg/cpp0x/pr102305.C: New test.

Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]

PR fortran/102313

gcc/fortran/ChangeLog:

* parse.c (gfc_ascii_statement): Add missing ST_OMP_END_SCOPE.

gcc/testsuite/ChangeLog:

* gfortran.dg/goacc/unexpected-end.f90: New test.
* gfortran.dg/gomp/unexpected-end.f90: New test.

testsuite: fix failing pytest tests

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov.py: Fix failing pytests as gcov.json.gz
filename was changed in b777f228b481ae881a7fbb09de367a053740932c.

Fix PR ada/101970

This is a regression present on the mainline and 11 branch in the form of an
ICE for an enumeration type with a full signed representation for its size.

gcc/ada/
PR ada/101970
* exp_attr.adb (Expand_N_Attribute_Reference) <Attribute_Enum_Rep>:
Use an unchecked conversion instead of a regular conversion in the
enumeration case and remove Conversion_OK flag in the integer case.
<Attribute_Pos>: Remove superfluous test.

gcc/testsuite/
* gnat.dg/enum_rep2.adb: New test.

arc: Update ZOL pattern.

The ZOL pattern is missing modes which may lead to errors during
var_tracking. Add them.

gcc/
* config/arc/arc.md (doloop_end): Add missing mode.
(loop_end): Likewise.

Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>

Do not issue size error for too large array type

The error is to be issued when objects of the type are declared instead.

gcc/ada/
* gcc-interface/decl.c (validate_size): Do not issue an error if the
old size has overflowed.

Fix inaccurate bounds in debug info for vector array types

They should not be 0-based, unless the array type itself is.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): For vector types, make
the representative array the debug type.

Fix internal error on broken import of vector intrinsics

The change also makes small adjustments to warning messages for intrinsics.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Turn variable
into constant. Capitalize GCC in warning message.
(intrin_arglists_compatible_p): Change parameter to pointer-to-const
Adjust warning messages. Turn warning into error for vector types.
(intrin_return_compatible_p): Likewise.
(intrin_profiles_compatible_p): Change parameter to pointer-to-const

Strengthen compatibility warning for GCC builtins

This is necessary for vector builtins, which are picky about the
signedness of the element type.

gcc/ada/
* libgnat/s-atopri.ads (bool): Delete.
(Atomic_Test_And_Set): Replace bool with Boolean.
(Atomic_Always_Lock_Free): Likewise.
* libgnat/s-aoinar.adb (Is_Lock_Free): Adjust.
* libgnat/s-aomoar.adb (Is_Lock_Free): Likewise.
* libgnat/s-aotase.adb (Atomic_Test_And_Set): Likewise.
* libgnat/s-atopex.adb (Atomic_Compare_And_Exchange): Likewise.
* gcc-interface/decl.c: Include gimple-expr.h.
(intrin_types_incompatible_p): Delete.
(intrin_arglists_compatible_p): Call types_compatible_p.
(intrin_return_compatible_p): Likewise.

Fix internal error on pointer-to-pointer binding in LTO mode

gcc/ada/
* gcc-interface/utils.c (update_pointer_to): Set TYPE_CANONICAL on
pointer and reference types.

testsuite: Use sync_long_long instead of sync_int_long for atomic-29.c test

As discussed, the test tests atomics on doubles which are 64-bit and so we
should use sync_long_long effective target instead of sync_int_long that
covers 64-bit atomics only on 64-bit arches.  I've added -march=pentium
to follow what is documented for sync_long_long, I guess -march=zarch should
be added for s390* too, but haven't tested that.

And using sync_long_long found a syntax error in that effective target
implementation, so I've fixed that too.

2021-09-14  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/gomp/atomic-29.c: Add -march=pentium
dg-additional-options for ia32.  Use sync_long_long effective target
instead of sync_int_long.
* lib/target-supports.exp (check_effective_target_sync_long_long): Fix
a syntax error.

openmp: Add testing checks (whether lhs appears in operands at all) to more trees

This patch adds testing checks (goa_stabilize_expr with NULL pre_p) for more
tree codes, so that we don't gimplify their operands individually unless lhs
appears in them. Also, so that we don't have exponential compile time complexity
with the added checks, I've added a depth computation, we don't expect lhs
to be found in depth 8 or above as all the atomic forms must have x expression
in specific places in the expressions.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

* gimplify.c (goa_stabilize_expr): Add depth argument, propagate
it to recursive calls, for depth above 7 just gimplify or return.
Perform a test even for MODIFY_EXPR, ADDR_EXPR, COMPOUND_EXPR with
__builtin_clear_padding and TARGET_EXPR.
(gimplify_omp_atomic): Adjust goa_stabilize_expr callers.

Implement PR ada/101385

For consistency's sake with -Wall & -w, this makes -Werror imply -gnatwe.

gcc/ada/
PR ada/101385
* doc/gnat_ugn/building_executable_programs_with_gnat.rst
(-Wall): Minor fixes.
(-w): Likewise.
(-Werror): Document that it also sets -gnatwe by default.
* gcc-interface/lang-specs.h (ada): Expand -gnatwe if -Werror is
passed and move expansion of -gnatw switches to before -gnatez.