git.ipfire.org Git - thirdparty/gcc.git/log

riscv: xtheadmempair: Fix doc for th_mempair_order_operands()

There is an incorrect sentence in the documentation of the function
th_mempair_order_operands(). Let's remove it.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_operands_p):
Fix documentation of th_mempair_order_operands().

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: xtheadmempair: Fix CFA reg notes

The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extu

The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.

gcc/ChangeLog:

* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'

False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.

gcc/ChangeLog:

PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New
define_insn.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to
define_insn_and_split to avoid false dependence.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot<mode>3): Ditto.
(iornot<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110438.c: New test.
* gcc.target/i386/pr100711-6.c: Adjust testcase.

Initial Granite Rapids D Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add graniterapids-d.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.

i386: Guard 128 bit VAES builtins with AVX512VL

Since commit 24a8acc, 128 bit intrin is enabled for VAES. However,
AVX512VL is not checked until we reached into pattern, which reports an
ICE.

Added an AVX512VL guard at builtin to report error when checking ISA
flags.

gcc/ChangeLog:

* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
Add OPTION_MASK_ISA_AVX512VL.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-vaes-1.c: New test.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add Hao Liu to write after approval

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add Ken Matsui to write after approval

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

Daily bump.

RISC-V: Optimize permutation codegen with vcompress

This patch is to recognize specific permutation pattern which can be applied compress approach.

Consider this following case:
typedef int8_t vnx64i __attribute__ ((vector_size (64)));
  1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,    \
    37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,    \
    82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,    \
    100, 101, 102, 103, 104, 105, 106, 107
void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out)
{
  vnx64i v1 = *(vnx64i*)x;
  vnx64i v2 = *(vnx64i*)y;
  vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
  *(vnx64i*)out = v3;
}

https://godbolt.org/z/P33nev6cW

Before this patch:
        lui     a4,%hi(.LANCHOR0)
        addi    a4,a4,%lo(.LANCHOR0)
        vl4re8.v        v4,0(a4)
        li      a4,64
        vsetvli a5,zero,e8,m4,ta,mu
        vl4re8.v        v20,0(a0)
        vl4re8.v        v16,0(a1)
        vmv.v.x v12,a4
        vrgather.vv     v8,v20,v4
        vmsgeu.vv       v0,v4,v12
        vsub.vv v4,v4,v12
        vrgather.vv     v8,v16,v4,v0.t
        vs4r.v  v8,0(a2)
        ret

After this patch:
        lui a4,%hi(.LANCHOR0)
        addi a4,a4,%lo(.LANCHOR0)
        vsetvli a5,zero,e8,m4,ta,ma
        vl4re8.v v12,0(a1)
        vl4re8.v v8,0(a0)
        vlm.v v0,0(a4)
        vslideup.vi v4,v12,20
        vcompress.vm v4,v8,v0
        vs4r.v v4,0(a2)
        ret

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Add vcompress optimization.
* config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
(shuffle_compress_patterns): Ditto.
(expand_vec_perm_const_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.

testsuite: Skip failing analyzer tests on AIX.

Some of the analyzer out-of-bounds-diagram tests fail on AIX.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-4.c: Skip on AIX.
* gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-7.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-13.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-15.c: Same.

Fortran: formal symbol attributes for intrinsic procedures [PR110288]

gcc/fortran/ChangeLog:

PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.

gcc/testsuite/ChangeLog:

PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.

cfg+gcse: Change return type of predicate functions from int to bool

Also change some internal variables from int to bool.

gcc/ChangeLog:

* cfghooks.cc (verify_flow_info): Change "err" variable to bool.
* cfghooks.h (struct cfg_hooks): Change return type of
verify_flow_info from integer to bool.
* cfgrtl.cc (can_delete_note_p): Change return type from int to bool.
(can_delete_label_p): Ditto.
(rtl_verify_flow_info): Change return type from int to bool
and adjust function body accordingly. Change "err" variable to bool.
(rtl_verify_flow_info_1): Ditto.
(free_bb_for_insn): Change return type to void.
(rtl_merge_blocks): Change "b_empty" variable to bool.
(try_redirect_by_replacing_jump): Change "fallthru" variable to bool.
(verify_hot_cold_block_grouping): Change return type from int to bool.
Change "err" variable to bool.
(rtl_verify_edges): Ditto.
(rtl_verify_bb_insns): Ditto.
(rtl_verify_bb_pointers): Ditto.
(rtl_verify_bb_insn_chain): Ditto.
(rtl_verify_fallthru): Ditto.
(rtl_verify_bb_layout): Ditto.
(purge_all_dead_edges): Change "purged" variable to bool.
* cfgrtl.h (free_bb_for_insn): Change return type from int to void.
* postreload-gcse.cc (expr_hasher::equal): Change "equiv_p" to bool.
(load_killed_in_block_p): Change return type from int to bool
and adjust function body accordingly.
(oprs_unchanged_p): Return true/false.
(rest_of_handle_gcse2): Change return type to void.
* tree-cfg.cc (gimple_verify_flow_info): Change return type from
int to bool. Change "err" variable to bool.

rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector built-in tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in.  There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.

testsuite: Require vectors of doubles for pr97428.c

The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.

gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.

[modula2] Improve uninitialized variable analysis by combining basic blocks

This patch combines basic blocks for static analysis of uninitialized
variables providing that they are not the top of a loop, are not reached
by a conditional and are not reached after a procedure call.  It also
avoids checking array accesses for static analysis.  Finally the patch
adds switch modifiers to allow static analysis to include conditional
branches for subsequent basic block analysis.

gcc/ChangeLog:

* doc/gm2.texi (-Wuninit-variable-checking=) New item.

gcc/m2/ChangeLog:

* gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New
parameter ScopeSym.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New
parameter ScopeSym.
(InitBasicBlocksFromRange): New parameter ScopeSym.  Call
ConvertQuads2BasicBlock with ScopeSym.
(DisplayBasicBlocks): Uncomment.
* gm2-compiler/M2Code.mod: Replace VariableAnalysis with
ScopeBlockVariableAnalysis.
(InitialDeclareAndOptiomize): Add parameter scope.
(SecondDeclareAndOptimize): Add parameter scope.
* gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope
parameter to DeclareTypesConstantsProceduresInRange.
(DeclareTypesConstantsProceduresInRange): New parameter scope.
Pass scope to DisplayQuadRange.  Reformatted.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2Optimize.mod (KnownReachable): New parameter
scope.
* gm2-compiler/M2Options.def (SetUninitVariableChecking): Add
arg parameter.
* gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add
arg parameter and set boolean UninitVariableChecking and
UninitVariableConditionalChecking.
(UninitVariableConditionalChecking): New boolean set to FALSE.
* gm2-compiler/M2Quads.def (IsGoto): New procedure function.
(DisplayQuadRange): Add scope parameter.
(LoopAnalysis): Add scope parameter.
* gm2-compiler/M2Quads.mod: Import PutVarArrayRef.
(IsGoto): New procedure function.
(LoopAnalysis): Add scope parameter and use MetaErrorT1 instead
of WarnStringAt.
(BuildStaticArray): Call PutVarArrayRef.
(BuildDynamicArray): Call PutVarArrayRef.
(DisplayQuadRange): Add scope parameter.
(GetM2OperatorDesc): Add relational condition cases.
* gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter.
* gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to
DisplayQuadRange.
(ForeachScopeBlockDo): Pass scopeSym to p.
* gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ...
(ScopeBlockVariableAnalysis): ... this.
* gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add
scope parameter.
(bbEntry): New pointer to record.
(bbArray): New array.
(bbFreeList): New variable.
(errorList): New list.
(IssueConditional): New procedure.
(GenerateNoteFlow): New procedure.
(IssueWarning): New procedure.
(IsUniqueWarning): New procedure.
(CheckDeferredRecordAccess): Re-implement.
(CheckBinary): Add warning and lst parameters.
(CheckUnary): Add warning and lst parameters.
(CheckXIndr): Add warning and lst parameters.
(CheckIndrX): Add warning and lst parameters.
(CheckBecomes): Add warning and lst parameters.
(CheckComparison): Add warning and lst parameters.
(CheckReadBeforeInitQuad): Add warning and lst parameters to all
Check procedures.  Add all case quadruple clauses.
(FilterCheckReadBeforeInitQuad): Add warning and lst parameters.
(CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters.
(bbArrayKill): New procedure.
(DumpBBEntry): New procedure.
(DumpBBArray): New procedure.
(DumpBBSequence): New procedure.
(TestBBSequence): New procedure.
(CreateBBPermultations): New procedure.
(ScopeBlockVariableAnalysis): New procedure.
(GetOp3): New procedure.
(GenerateCFG): New procedure.
(NewEntry): New procedure.
(AppendEntry): New procedure.
(init): Initialize bbFreeList and errorList.
* gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field.
(MakeVar): Set ArrayRef to FALSE.
(PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype.
(init_PerCompilationInit): Add call to _M2_M2SymInit_init.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New definition.
* gm2-lang.cc (gm2_langhook_handle_option): Add new case
OPT_Wuninit_variable_checking_.
* lang.opt: Wuninit-variable-checking= new entry.

gcc/testsuite/ChangeLog:

* gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test.
* gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp:
New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space

libgomp/

* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.

c++: coercing variable template from current inst [PR110580]

Here during ahead of time coercion of the variable template-id v1<int>,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U just lowers U from level 2 to level 1 rather
than replacing it with int as expected. Thus after coercion we incorrectly
end up with (effectively) v1<int, T> instead of v1<int, int>.

Coercion of a class/alias template-id on the other hand always passes
all levels arguments, which avoids this issue. So this patch makes us
do the same for variable template-ids.

PR c++/110580

gcc/cp/ChangeLog:

* pt.cc (lookup_template_variable): Pass all levels of arguments
to coerce_template_parms, and use the parameters from the most
general template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ83.C: New test.

Fix typo in the testcase.

Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`

gcc/testsuite/ChangeLog:

PR target/110170
* g++.target/i386/pr110170.C: Fix typo.

VECT: Add COND_LEN_* operations for loop control with length targets

Hi, Richard and Richi.

This patch is adding cond_len_* operations pattern for target support loop control with length.

These patterns will be used in these following case:

1. Integer division:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
   {
     for (int i = 0; i < n; ++i)
      {
        a[i] = b[i] / c[i];
      }
   }

  ARM SVE IR:

  ...
  max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });

  Loop:
  ...
  # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)>
  ...
  vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
  ...
  vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
  vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28);
  ...
  .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  For target like RVV who support loop control with length, we want to see IR as follows:

  Loop:
  ...
  # loop_len_29 = SELECT_VL
  ...
  vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
  ...
  vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
  vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias);
  ...
  .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  Notice here, we use dummp_mask = { -1, -1, .... , -1 }

2. Integer conditional division:
   Similar case with (1) but with condtion:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
   {
     for (int i = 0; i < n; ++i)
       {
         if (cond[i])
         a[i] = b[i] / c[i];
       }
   }

   ARM SVE:
   ...
   max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });

   Loop:
   ...
   # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)>
   ...
   vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
   ...
   vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
   ...
   vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
   vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62);
   ...
   .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
   ...
   next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });

   Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result.

   However, target with length control can not perform this elegant flow, for RVV, we would expect:

   Loop:
   ...
   loop_len_55 = SELECT_VL
   ...
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   ...
   vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias);
   ...

   Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
   and a real length which is produced by loop control : loop_len_55 = SELECT_VL

3. conditional Floating-point operations (no -ffast-math):

    void
    f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
    {
      for (int i = 0; i < n; ++i)
        {
          if (cond[i])
          a[i] = b[i] + a[i];
        }
    }

  ARM SVE IR:
  max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });

  ...
  # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)>
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
  ...
  vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);
  ...
  next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
  ...

  For RVV, we would expect IR:

  ...
  loop_len_49 = SELECT_VL
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  ...
  vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias);
  ...

4. Conditional un-ordered reduction:

   int32_t
   f (int32_t *restrict a,
   int32_t *restrict cond, int n)
   {
     int32_t result = 0;
     for (int i = 0; i < n; ++i)
       {
           if (cond[i])
         result += a[i];
       }
     return result;
   }

   ARM SVE IR:

     Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)>
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
     ...
     vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

   For RVV, we expect:

    Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     loop_len_40 = SELECT_VL
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     ...
     vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

     I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
     same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.

gcc/ChangeLog:

* doc/md.texi: Add COND_LEN_* operations for loop control with length.
* internal-fn.cc (cond_len_unary_direct): Ditto.
(cond_len_binary_direct): Ditto.
(cond_len_ternary_direct): Ditto.
(expand_cond_len_unary_optab_fn): Ditto.
(expand_cond_len_binary_optab_fn): Ditto.
(expand_cond_len_ternary_optab_fn): Ditto.
(direct_cond_len_unary_optab_supported_p): Ditto.
(direct_cond_len_binary_optab_supported_p): Ditto.
(direct_cond_len_ternary_optab_supported_p): Ditto.
* internal-fn.def (COND_LEN_ADD): Ditto.
(COND_LEN_SUB): Ditto.
(COND_LEN_MUL): Ditto.
(COND_LEN_DIV): Ditto.
(COND_LEN_MOD): Ditto.
(COND_LEN_RDIV): Ditto.
(COND_LEN_MIN): Ditto.
(COND_LEN_MAX): Ditto.
(COND_LEN_FMIN): Ditto.
(COND_LEN_FMAX): Ditto.
(COND_LEN_AND): Ditto.
(COND_LEN_IOR): Ditto.
(COND_LEN_XOR): Ditto.
(COND_LEN_SHL): Ditto.
(COND_LEN_SHR): Ditto.
(COND_LEN_FMA): Ditto.
(COND_LEN_FMS): Ditto.
(COND_LEN_FNMA): Ditto.
(COND_LEN_FNMS): Ditto.
(COND_LEN_NEG): Ditto.
* optabs.def (OPTAB_D): Ditto.

tree-optimization/110614 - SLP splat and re-align (optimized)

The following properly guards the re-align (optimized) paths used
on old power CPUs for the added case of SLP splats from non-grouped
loads. Testcases are existing in dg-torture.

PR tree-optimization/110614
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
SLP splats are not suitable for re-align ops.

ada: Avoid renaming_decl in case of constrained array

This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);".
That rewrite is incorrect if S is a constrained array subtype,
because it changes the semantics. In the original, the
bounds of X are that of S. But constraints are ignored in
renamings, so the bounds of X would come from F'Result.
This can cause spurious Constraint_Errors in some obscure
cases. It causes unnecessary checks to be inserted, and even
when such checks pass (more common case), they might be less
efficient.

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to
a renaming in case of constrained array that comes from source.

ada: Fix wrong resolution for hidden discriminant in predicate

The problem occurs for hidden discriminants of private discriminated types.

gcc/ada/

* sem_ch13.adb (Replace_Type_References_Generic.Visible_Component):
In the case of private discriminated types, return a discriminant
only if it is listed in the discriminant part of the declaration.

testsuite: Unbreak pr110557.cc where long is 32-bit

On ports with 32-bit long, the test produced excess errors:

gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type

Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/testsuite/ChangeLog:

* g++.dg/vect/pr110557.cc: Use long long instead of long for
64-bit type.
(test): Remove an unnecessary cast.

libgcc: Fix -Wint-conversion warning in find_fde_tail

Fixes commit r14-1614-g49310a99330849 ("libgcc: Fix eh_frame fast path
in find_fde_tail").

libgcc/

PR libgcc/110179
* unwind-dw2-fde-dip.c (find_fde_tail): Add cast to avoid
implicit conversion of pointer value to integer.

Daily bump.

rs6000: Remove redundant MEM_P predicate usage

The quad_memory_operand and vsx_quad_dform_memory_operand predicates contain
a (match_code "mem") test, making their MEM_P usage redundant. Remove them.

2023-07-10 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/predicates.md (quad_memory_operand): Remove redundant
MEM_P usage.
(vsx_quad_dform_memory_operand): Likewise.

d: Merge upstream dmd, druntime a88e1335f7, phobos 1921d29df.

D front-end changes:

- Import dmd v2.104.1.
- Deprecation phase ended for access to private method when
overloaded with public method.

D runtime changes:

- Import druntime v2.104.1.
- Linux input header translations were added to druntime.
- Integration with the Valgrind `memcheck' tool has been added
to the garbage collector.

Phobos changes:

- Import phobos v2.104.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd a88e1335f7.
* dmd/VERSION: Bump version to v2.104.1.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime a88e1335f7.
* src/MERGE: Merge upstream phobos 1921d29df.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (libphobos-checking): Add valgrind flag.
(DRUNTIME_LIBRARIES_VALGRIND): Call.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add
etc/valgrind/valgrind_.c.
(DRUNTIME_DSOURCES): Add etc/valgrind/valgrind.d.
(DRUNTIME_DSOURCES_LINUX): Add core/sys/linux/input.d,
core/sys/linux/input_event_codes.d, core/sys/linux/uinput.d.
* libdruntime/Makefile.in: Regenerate.
* m4/druntime/libraries.m4 (DRUNTIME_LIBRARIES_VALGRIND): Define.

reorg: Change return type of predicate functions from int to bool

Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* reorg.cc (stop_search_p): Change return type from int to bool
and adjust function body accordingly.
(resource_conflicts_p): Ditto.
(insn_references_resource_p): Change return type from int to bool.
(insn_sets_resource_p): Ditto.
(redirect_with_delay_slots_safe_p): Ditto.
(condition_dominates_p): Change return type from int to bool
and adjust function body accordingly.
(redirect_with_delay_list_safe_p): Ditto.
(check_annul_list_true_false): Ditto.  Change "annul_true_p"
function argument to bool.
(steal_delay_list_from_target): Change "pannul_p" function
argument to bool pointer.  Change "must_annul" and "used_annul"
variables from int to bool.
(steal_delay_list_from_fallthrough): Ditto.
(own_thread_p): Change return type from int to bool and adjust
function body accordingly.  Change "allow_fallthrough" function
argument to bool.
(reorg_redirect_jump): Change return type from int to bool.
(fill_simple_delay_slots): Change "non_jumps_p" function
argument from int to bool.  Change "maybe_never" varible to bool.
(fill_slots_from_thread): Change "likely", "thread_if_true" and
"own_thread" function arguments to bool.  Change "lose" and
"must_annul" variables to bool.
(delete_from_delay_slot): Change "had_barrier" variable to bool.
(try_merge_delay_insns): Change "annul_p" variable to bool.
(fill_eager_delay_slots): Change "own_target" and "own_fallthrouhg"
variables to bool.
(rest_of_handle_delay_slots): Change return type from int to void
and adjust function body accordingly.

c++: redeclare_class_template and ttps [PR110523]

Now that we cache level-lowered ttps we can end up processing the same
ttp multiple times via (multiple calls to) redeclare_class_template, so
we can't assume a ttp's DECL_CONTEXT is initially empty.

PR c++/110523

gcc/cp/ChangeLog:

* pt.cc (redeclare_class_template): Relax the ttp DECL_CONTEXT
assert, and downgrade it to a checking assert.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp37.C: New test.

doc: Add doc for RISC-V Operand Modifiers

Document `z` and `i` operand modifiers, we have much more modifiers
other than those two, but they are the only two implement on both
GCC and LLVM, consider the compatibility I would like to document those
two first, and then review other modifiers later to see if any other should
expose and implement on RISC-V LLVM too.

gcc/ChangeLog:

* doc/extend.texi (RISC-V Operand Modifiers): New.

GCSE: Export 'insert_insn_end_basic_block' as global function

Since VSETVL PASS in RISC-V port is using common part of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and we will also this helper function in riscv.cc for the following patches.

So extract the common part codes of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)', the new function
of the common part is also call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' but with different arguments.
And call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' in 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and VSETVL PASS in RISC-V port.

Remove redundant codes of VSETVL PASS in RISC-V port.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (add_label_notes): Remove it.
(insert_insn_end_basic_block): Ditto.
(pass_vsetvl::commit_vsetvls): Adapt for new helper function.
* gcse.cc (insert_insn_end_basic_block): Export as global function.
* gcse.h (insert_insn_end_basic_block): Ditto.

arm: Fix MVE intrinsics support with LTO (PR target/110268)

After the recent MVE intrinsics re-implementation, LTO stopped working
because the intrinsics would no longer be defined.

The main part of the patch is simple and similar to what we do for
AArch64:
- call handle_arm_mve_h() from arm_init_mve_builtins to declare the
  intrinsics when the compiler is in LTO mode
- actually implement arm_builtin_decl for MVE.

It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
its value in the user code cannot be guessed at LTO time, so we always
have to assume that it was not defined.  The led to a few fixes in the
way we register MVE builtins as placeholders or not.  Without this
patch, we would just omit some versions of the inttrinsics when
__ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
placeholders, we need to always keep entries for all of them to ensure
that we have a consistent numbering scheme.

2023-06-26  Christophe Lyon   <christophe.lyon@linaro.org>

PR target/110268
gcc/
* config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
(arm_builtin_decl): Hahndle MVE builtins.
* config/arm/arm-mve-builtins.cc (builtin_decl): New function.
(add_unique_function): Fix handling of
__ARM_MVE_PRESERVE_USER_NAMESPACE.
(add_overloaded_function): Likewise.
* config/arm/arm-protos.h (builtin_decl): New declaration.

gcc/testsuite/
* gcc.target/arm/pr110268-1.c: New test.
* gcc.target/arm/pr110268-2.c: New test.

testsuite: Add _link flavor for several arm_arch* and arm* effective-targets

For arm targets, we generate many effective-targets with
check_effective_target_FUNC_multilib and
check_effective_target_arm_arch_FUNC_multilib which check if we can
link and execute a simple program with a given set of flags/multilibs.

In some cases however, it's possible to link but not to execute a
program, so this patch adds similar _link effective-targets which only
check if link succeeds.

The patch does not uupdate the documentation as it already lacks the
numerous existing related effective-targets.

2023-07-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* lib/target-supports.exp (arm_*FUNC_link): New effective-targets.

doc: Document arm_v8_1m_main_cde_mve_fp

The arm_v8_1m_main_cde_mve_fp family of effective targets was not
documented when it was introduced.

2023-07-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* doc/sourcebuild.texi (arm_v8_1m_main_cde_mve_fp): Document.

ada: Follow-up fix for compilation issue with recent MinGW-w64 versions

It turns out that adaint.c includes other Windows header files than just
windows.h, so defining WIN32_LEAN_AND_MEAN is not sufficient for it.

gcc/ada/

* adaint.c [_WIN32]: Undefine 'abort' macro.

ada: Add typedefs to snames.h-tmpl

A future patch will change sname.h-tmpl to use enums rather than
preprocessor defines. In order to do this, first introduce some
typedefs that can be used in gcc-interface.

gcc/ada/

* snames.h-tmpl (Name_Id, Attribute_Id, Convention_Id)
(Pragma_Id): New typedefs.
(Get_Attribute_Id, Get_Pragma_Id): Use typedef.

ada: Simplify assertion to remove CodePeer message

CodePeer is correctly warning on a test always true in an assertion.
It can be rewritten without loss of proof to avoid that message.

gcc/ada/

* libgnat/s-aridou.adb (Lemma_Powers_Of_2_Commutation): Rewrite
assertion.

ada: Documentation for mixed declarations and statements

This patch documents the new feature that allows declarations mixed with
statements, primarily by referring to the RFC.

gcc/ada/

* doc/gnat_rm/gnat_language_extensions.rst
(Local Declarations Without Block): Document the feature very
briefly, and refer the reader to the RFC for details and examples.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: hardcfr: optionally disable in leaf functions

Document -fhardcfr-skip-leaf.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Control Flow
Hardening): Document -fhardcfr-skip-leaf.
* gnat_rm.texi: Regenerate.

ada: hardcfr: mark throw-expected functions

Adjust documentation to reflect the introduction of
-fhardcfr-check-noreturn-calls=no-xthrow.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Control Flow
Redundancy): Add -fhardcfr-check-noreturn-calls=no-xthrow.
* gnat_rm.texi: Regenerate.

ada: Adapt proof of System.Arith_Double to remove CVC4

The proof of System.Arith_Double still required the use of
CVC4, now replaced by its successor cvc5. Adapt the proof to be
able to remove CVC4 in the proof of run-time units.

gcc/ada/

* libgnat/s-aridou.adb (Lemma_Div_Mult): New simple lemma.
(Lemma_Powers_Of_2_Commutation): State post in else branch.
(Lemma_Div_Pow2): Introduce local lemma and use it.
(Scaled_Divide): Use cut operations in assertions, lemmas, new
assertions. Introduce local lemma and use it.

ada: Add leafy mode for zero-call-used-regs

Document leafy mode.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Register
Scrubbing): Document leafy mode.
* gnat_rm.texi: Regenerate.

vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

    int x : 8;
    long y : 55;
    bool z : 1;

The vectorized extraction of y was:

    vect__ifc__49.29_110 =
      MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108];
    vect_patt_38.30_112 =
      vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
    vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
    vect_patt_40.32_114 =
      VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

    vect__ifc__25.16_62 =
      MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60];
    vect_patt_31.17_63 =
      VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62);
    vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
    vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.

i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.

This patch implements another of Uros' suggestions, to investigate a
insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64.
In PR 88873, the RTL the middle-end expands for passing V2DF in TImode
is subtly different from what it does for V2DI in TImode, sufficiently so
that my explanations for why insvti_lowpart_1 isn't required don't apply
in this case.

This patch adds an insvti_lowpart_1 pattern, complementing the existing
insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1.
Because the middle-end represents 128-bit constants using CONST_WIDE_INT
and 64-bit constants using CONST_INT, it's easiest to treat these as
different patterns, rather than attempt <dwi> parameterization.

This patch also includes a peephole2 (actually a pair) to transform
xchg instructions into mov instructions, when one of the destinations
is unused.  This optimization is required to produce the optimal code
sequences below.

For the 64-bit case:

__int128 foo(__int128 x, unsigned long long y)
{
  __int128 m = ~((__int128)~0ull);
  __int128 t = x & m;
  __int128 r = t | y;
  return r;
}

Before:
        xchgq   %rdi, %rsi
        movq    %rdx, %rax
        xorl    %esi, %esi
        xorl    %edx, %edx
        orq     %rsi, %rax
        orq     %rdi, %rdx
        ret

After:
        movq    %rdx, %rax
        movq    %rsi, %rdx
        ret

For the 32-bit case:

long long bar(long long x, int y)
{
  long long mask = ~0ull << 32;
  long long t = x & mask;
  long long r = t | (unsigned int)y;
  return r;
}

Before:
        pushl   %ebx
        movl    12(%esp), %edx
        xorl    %ebx, %ebx
        xorl    %eax, %eax
        movl    16(%esp), %ecx
        orl     %ebx, %edx
        popl    %ebx
        orl     %ecx, %eax
        ret

After:
        movl    12(%esp), %eax
        movl    8(%esp), %edx
        ret

2023-07-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform xchg insn with a
REG_UNUSED note to a (simple) move.
(*insvti_lowpart_1): New define_insn_and_split.
(*insvdi_lowpart_1): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/insvdi_lowpart-1.c: New test case.
* gcc.target/i386/insvti_lowpart-1.c: Likewise.

i386: Add AVX512 support for STV of SI/DImode rotation by constant.

Following Uros' suggestion, this patch adds support for AVX512VL's
vpro[lr][dq] instructions to the recently added scalar-to-vector (STV)
enhancements to handle DImode and SImode rotations by a constant.

For the test cases:

unsigned long long rot1(unsigned long long x) {
  return (x>>1) | (x<<63);
}

void mem1(unsigned long long *p) {
  *p = rot1(*p);
}

with -m32 -O2 -mavx512vl, we currently generate:

rot1:   movl    4(%esp), %eax
        movl    8(%esp), %edx
        movl    %eax, %ecx
        shrdl   $1, %edx, %eax
        shrdl   $1, %ecx, %edx
        ret

mem1:   movl    4(%esp), %eax
        vmovq   (%eax), %xmm0
        vpshufd $20, %xmm0, %xmm0
        vpsrlq  $1, %xmm0, %xmm0
        vpshufd $136, %xmm0, %xmm0
        vmovq   %xmm0, (%eax)
        ret

with this patch, we now generate:

rot1:   vmovq   4(%esp), %xmm0
        vprorq  $1, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        vpextrd $1, %xmm0, %edx
        ret

mem1: movl    4(%esp), %eax
        vmovq   (%eax), %xmm0
        vprorq  $1, %xmm0, %xmm0
        vmovq   %xmm0, (%eax)
        ret

2023-07-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL.
(general_scalar_chain::convert_rotate): On TARGET_AVX512F generate
avx512vl_rolv2di or avx412vl_rolv4si when appropriate.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.

d: Merge upstream dmd, druntime 17ccd12af3, phobos 8d3800bee.

D front-end changes:

- Import dmd v2.104.0.
- Assignment-style syntax is now allowed for `alias this'.
- Overloading `extern(C)' functions is now an error.

D runtime changes:

- Import druntime v2.104.0.

Phobos changes:

- Import phobos v2.104.0.
- Better static assert messages when instantiating
`std.algorithm.iteration.permutations' with wrong inputs.
- Added `std.system.instructionSetArchitecture' and
`std.system.ISA'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 17ccd12af3.
* dmd/VERSION: Bump version to v2.104.0.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/apply.o to
d/postordervisitor.o.
* d-codegen.cc (make_location_t): Update for new front-end interface.
(build_filename_from_loc): Likewise.
(build_assert_call): Likewise.
(build_array_bounds_call): Likewise.
(build_bounds_index_condition): Likewise.
(build_bounds_slice_condition): Likewise.
(build_frame_type): Likewise.
(get_frameinfo): Likewise.
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Likewise.
* decl.cc (build_decl_tree): Likewise.
(start_function): Likewise.
* expr.cc (ExprVisitor::visit (NewExp *)): Replace code generation of
`new pointer' with front-end lowering.
* runtime.def (NEWITEMT): Remove.
(NEWITEMIT): Remove.
* toir.cc (IRVisitor::visit (LabelStatement *)): Update for new
front-end interface.
* typeinfo.cc (check_typeinfo_type): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 17ccd12af3.
* src/MERGE: Merge upstream phobos 8d3800bee.

gcc/testsuite/ChangeLog:

* gdc.dg/asm4.d: Update test.

Add pre_reload splitter to detect fp min/max pattern.

We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.

This patch adds pre_reload splitter to detect the min/max pattern.

Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min<mode>3_1): Ditto, but for fp min pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.

Daily bump.

d: Merge upstream dmd, druntime 28a3b24c2e, phobos 8ab95ded5.

D front-end changes:

- Import dmd v2.104.0-beta.1.
- Better error message when attribute inference fails down the
  call stack.
- Using `;' as an empty statement has been turned into an error.
- Using `in' parameters with non- `extern(D)' or `extern(C++)'
  functions is deprecated.
- `in ref' on parameters has been deprecated in favor of
  `-preview=in'.
- Throwing `immutable', `const', `inout', and `shared' qualified
  objects is now deprecated.
- User Defined Attributes now parse Template Arguments.

D runtime changes:

- Import druntime v2.104.0-beta.1.

Phobos changes:

- Import phobos v2.104.0-beta.1.
- Better static assert messages when instantiating
  `std.algorithm.comparison.clamp' with wrong inputs.
- `std.typecons.Rebindable' now supports all types.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 28a3b24c2e.
* dmd/VERSION: Bump version to v2.104.0-beta.1.
* d-codegen.cc (build_bounds_slice_condition): Update for new
front-end interface.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Initialize global.compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation
with new front-end lowering.
(ExprVisitor::visit (LoweredAssignExp *)): New method.
(ExprVisitor::visit (StructLiteralExp *)): Don't generate static
initializer symbols for structs defined in C sources.
* runtime.def (ARRAYCATT): Remove.
(ARRAYCATNTX): Remove.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 28a3b24c2e.
* src/MERGE: Merge upstream phobos 8ab95ded5.

gcc/testsuite/ChangeLog:

* gdc.dg/rtti1.d: Move array concat testcase to ...
* gdc.dg/nogc1.d: ... here.  New test.

Improve dumping of profile_count

Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point
values.  In many cases one looks at a single function and it is better to think of
basic block frequency, that is how many times it is executed each invocatoin. This
patch makes CFG dumps to also print this info.

For example:
main()
{
for (int i = 0; i < 10; i++)
t();
}

the -fdump-tree-optimized-blocks-details now prints:
int main ()
{
  unsigned int ivtmp_1;
  unsigned int ivtmp_2;

;;   basic block 2, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;;    prev block 0, next block 3, flags: (NEW, VISITED)
;;    pred:       ENTRY [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
;;    succ:       3 [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)

;;   basic block 3, loop depth 1, count 976138697 (estimated locally, freq 10.0011), maybe hot
;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:       3 [90.0% (guessed)]  count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;;                2 [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
  # ivtmp_2 = PHI <ivtmp_1(3), 10(2)>
  t ();
  ivtmp_1 = ivtmp_2 + 4294967295;
  if (ivtmp_1 != 0)
    goto <bb 3>; [90.00%]
  else
    goto <bb 4>; [10.00%]
;;    succ:       3 [90.0% (guessed)]  count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;;                4 [10.0% (guessed)]  count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;;    prev block 3, next block 1, flags: (NEW, VISITED)
;;    pred:       3 [10.0% (guessed)]  count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)
  return 0;
;;    succ:       EXIT [always]  count:97603128 (estimated locally, freq 1.0000) (EXECUTABLE)

}

Which makes it easier to see that the inner bb is executed 10 times per invocation

gcc/ChangeLog:

* cfg.cc (check_bb_profile): Dump counts with relative frequency.
(dump_edge_info): Likewise.
(dump_bb_info): Likewise.
* profile-count.cc (profile_count::dump): Add comma between quality and
freq.

gcc/testsuite/ChangeLog:

* gcc.dg/predict-22.c: Update template.

Daily bump.

Add missing profile_dump check

gcc/ChangeLog:

PR tree-optimization/110600
* cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check.

gcc/testsuite/ChangeLog:

PR tree-optimization/110600
* gcc.c-torture/compile/pr110600.c: New test.

Fortran: Fix default type bugs in gfortran [PR99139, PR99368]

2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu>

gcc/fortran
PR fortran/99139
PR fortran/99368
* match.cc (gfc_match_namelist): Check for host associated or
defined types before applying default type.
(gfc_match_select_rank): Apply default type to selector of
unknown type if possible.
* resolve.cc (resolve_fl_variable): Do not apply local default
initialization to assumed rank entities.

gcc/testsuite/
PR fortran/99139
* gfortran.dg/pr99139.f90 : New test

PR fortran/99368
* gfortran.dg/pr99368.f90 : New test

Fix tree-ssa/update-cunroll.c

In this testcase the profile is misupdated before loop has two exits.
The first exit is one eliminated by complete unrolling while second exit remains.
We remove first exit but forget about fact that the source BB of other exit will
then have higher frequency making other exit more likely.

This patch fixes that in duplicate_loop_body_to_header_edge.
While looking into resulting profiles I also noticed that in some cases
scale_loop_profile may drop probabilities to 0 incorrectly either when
trying to update exit from nested loop (which has similar problem) or when the profile
was inconsistent as described in coment bellow.

gcc/ChangeLog:

PR middle-end/110590
* cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within
inner loops and be more careful about inconsistent profiles.
(duplicate_loop_body_to_header_edge): Fix profile update when eliminated
exit is followed by other exit.

gcc/testsuite/ChangeLog:

PR middle-end/110590
* gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail.
* gcc.dg/tree-ssa/update-cunroll.c: Likewise.

Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]

gcc/fortran/ChangeLog:

PR fortran/92178
* trans-expr.cc (gfc_conv_procedure_call): Check procedures for
allocatable dummy arguments with INTENT(OUT) and move deallocation
of actual arguments after evaluation of argument expressions before
the procedure is executed.

gcc/testsuite/ChangeLog:

PR fortran/92178
* gfortran.dg/intent_out_16.f90: New test.
* gfortran.dg/intent_out_17.f90: New test.
* gfortran.dg/intent_out_18.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>

Fortran: simplification of FINDLOC for constant complex arguments [PR110585]

gcc/fortran/ChangeLog:

PR fortran/110585
* arith.cc (gfc_compare_expr): Handle equality comparison of constant
complex gfc_expr arguments.

gcc/testsuite/ChangeLog:

PR fortran/110585
* gfortran.dg/findloc_9.f90: New test.

cprop: Change return type of predicate functions from int to bool

Also change some internal variables from int to bool.

gcc/ChangeLog:

* cprop.cc (reg_available_p): Change return type from int to bool.
(reg_not_set_p): Ditto.
(try_replace_reg): Ditto.  Change "success" variable to bool.
(cprop_jump): Change return type from int to void
and adjust function body accordingly.
(constprop_register): Ditto.
(cprop_insn): Ditto.  Change "changed" variable to bool.
(local_cprop_pass): Change return type from int to void
and adjust function body accordingly.
(bypass_block): Ditto.  Change "change", "may_be_loop_header"
and "removed_p" variables to bool.
(bypass_conditional_jumps): Change return type from int to void
and adjust function body accordingly.  Change "changed"
variable to bool.
(one_cprop_pass): Ditto.

gcse: Change return type of predicate functions from int to bool

Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* gcse.cc (expr_equiv_p): Change return type from int to bool.
(oprs_unchanged_p): Change return type from int to void
and adjust function body accordingly.
(oprs_anticipatable_p): Ditto.
(oprs_available_p): Ditto.
(insert_expr_in_table): Ditto.  Change "antic_p" and "avail_p"
arguments to bool. Change "found" variable to bool.
(load_killed_in_block_p): Change return type from int to void and
adjust function body accordingly.  Change "avail_p" argument to bool.
(pre_expr_reaches_here_p): Change return type from int to void
and adjust function body accordingly.
(pre_delete): Ditto.  Change "changed" variable to bool.
(pre_gcse): Change return type from int to void
and adjust function body accordingly. Change "did_insert" and
"changed" variables to bool.
(one_pre_gcse_pass): Change return type from int to void
and adjust function body accordingly.  Change "changed" variable
to bool.
(should_hoist_expr_to_dom): Change return type from int to void
and adjust function body accordingly.  Change
"visited_allocated_locally" variable to bool.
(hoist_code): Change return type from int to void and adjust
function body accordingly.  Change "changed" variable to bool.
(one_code_hoisting_pass): Ditto.
(pre_edge_insert): Change return type from int to void and adjust
function body accordingly.  Change "did_insert" variable to bool.
(pre_expr_reaches_here_p_work): Change return type from int to void
and adjust function body accordingly.
(simple_mem): Ditto.
(want_to_gcse_p): Change return type from int to void
and adjust function body accordingly.
(can_assign_to_reg_without_clobbers_p): Update function body
for bool return type.
(hash_scan_set): Change "antic_p" and "avail_p" variables to bool.
(pre_insert_copies): Change "added_copy" variable to bool.

doc: Fix typos in Warning Options [PR110596]

gcc/ChangeLog:

PR c++/110595
PR c++/110596
* doc/invoke.texi (Warning Options): Fix typos.

Daily bump.

Dump profile_count along with relative frequency

gcc/ChangeLog:

* profile-count.cc (profile_count::dump): Add FUN
parameter; print relative frequency.
(profile_count::debug): Update.
* profile-count.h (profile_count::dump): Update
prototype.

Fix fallout from re-enabling profile consistency checks.

gcc/testsuite/ChangeLog:

* gcc.dg/pr43864-2.c: Avoid matching pre dump with details-blocks.
* gcc.dg/pr43864-3.c: Likewise.
* gcc.dg/pr43864-4.c: Likewise.
* gcc.dg/pr43864.c: Likewise.
* gcc.dg/unroll-7.c: xfail.

Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

When we collect just user events for autofdo with lbr we get some events where branch
sources are kernel addresses and branch targets are user addresses. Without kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if
it can't map at least 95% of events. We sometimes get below this threshold with just
user events. The change is to collect both user events and kernel events.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Collect both kernel and user events for autofdo

i386: Improve __int128 argument passing (in ix86_expand_move).

Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
result in surprising code.  Consider the example below (from PR 43644):

unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
  return x+y;
}

which currently results in 6 consecutive movq instructions:

foo: movq    %rsi, %rax
        movq    %rdi, %rsi
        movq    %rdx, %rcx
        movq    %rax, %rdi
        movq    %rsi, %rax
        movq    %rdi, %rdx
        addq    %rcx, %rax
        adcq    $0, %rdx
        ret

The underlying issue is that during RTL expansion, we generate the
following initial RTL for the x argument:

(insn 4 3 5 2 (set (reg:TI 85)
        (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
     (nil))
(insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
        (reg:DI 87)) "pr43644-2.c":5:1 -1
     (nil))
(insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
        (reg:TI 85)) "pr43644-2.c":5:1 -1
     (nil))

which by combine/reload becomes

(insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
        (const_int 0 [0])) "pr43644-2.c":5:1 -1
     (nil))
(insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
        (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 93)
        (nil)))
(insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
        (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 94)
        (nil)))

where the heavy use of SUBREG SET_DESTs creates challenges for both
combine and register allocation.

The improvement proposed here is to avoid these problematic SUBREGs
by adding (two) special cases to ix86_expand_move.  For insn 4, which
sets a TImode destination from a paradoxical SUBREG, to assign the
lowpart, we can use an explicit zero extension (zero_extendditi2 was
added in July 2022), and for insn 5, which sets the highpart of a
TImode register we can use the *insvti_highpart_1 instruction (that
was added in May 2023, after being approved for stage1 in January).
This allows combine to work its magic, merging these insns into a
*concatditi3 and from there into other optimized forms.

So for the test case above, we now generate only a single movq:

foo: movq    %rdx, %rax
        xorl    %edx, %edx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

But there is a little bad news.  This patch causes two (minor) missed
optimization regressions on x86_64; gcc.target/i386/pr82580.c and
gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
no longer generating adcq $0, but instead using xorl.  For the other
FAIL, register allocation now has more freedom and is (arbitrarily)
choosing a register assignment that doesn't match what the test is
expecting.  These issues are easier to explain and fix once this patch
is in the tree.

The good news is that this approach fixes a number of long standing
issues, that need to checked in bugzilla, including PR target/110533
which was just opened/reported earlier this week.

2023-07-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/43644
PR target/110533
* config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
TImode destinations from paradoxical SUBREGs (setting the lowpart)
into explicit zero extensions.  Use *insvti_highpart_1 instruction
to set the highpart of a TImode destination.

gcc/testsuite/ChangeLog
PR target/43644
PR target/110533
* gcc.target/i386/pr110533.c: New test case.
* gcc.target/i386/pr43644-2.c: Likewise.

d: Fix PR 108842: Cannot use enum array with -fno-druntime

Restrict the generating of CONST_DECLs for D manifest constants to just
scalars without pointers. It shouldn't happen that a reference to a
manifest constant has not been expanded within a function body during
codegen, but it has been found to occur in older versions of the D
front-end (PR98277), so if the decl of a non-scalar constant is
requested, just return its initializer as an expression.

PR d/108842

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar
manifest constants.
(get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest
constants.
* imports.cc (ImportVisitor::visit (VarDeclaration *)): New method.

gcc/testsuite/ChangeLog:

* gdc.dg/pr98277.d: Add more tests.
* gdc.dg/pr108842.d: New test.

Simplify force_edge_cold.

gcc/ChangeLog:

* predict.cc (force_edge_cold): Use
set_edge_probability_and_rescale_others; improve dumps.

Fix some profile consistency testcases

Information about profile mismatches is printed only with -details-blocks for some time.
I think it should be printed even with default to make it easier to spot when someone introduces
new transform that breaks the profile, but I will send separate RFC for that.

This patch enables details in all testcases that greps for Invalid sum.  There are 4 testcases
which fails:
  gcc.dg/tree-ssa/loop-ch-profile-1.c
     here the problem is that loop header dulication introduces loop invariant conditoinal that is later
     updated by tree-ssa-dom but dom does not take care of updating profile.
     Since loop-ch knows when it duplicates loop invariant, we may be able to get this right.

     The test is still useful since it tests that right after ch profile is consistent.
  gcc.dg/tree-prof/update-cunroll-2.c
     This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized
     out exit is not last in the loop.  In that case the probability of later exits needs to be accounted in.
     I will think about making this better - in general this does not seem to have easy solution, but for
     special case of chained tests we can definitely account for the later exits.
  gcc.dg/tree-ssa/update-unroll-1.c
     This fails after aprefetch invoked unrolling.  I did not look into details yet.
  gcc.dg/tree-prof/update-unroll-2.c
     This one seems similar as previous
I decided to xfail these tests and deal with them incrementally and filled in PR110590.

gcc/testsuite/ChangeLog:

* g++.dg/tree-prof/indir-call-prof.C: Add block-details to dump flags.
* gcc.dg/pr43864-2.c: Likewise.
* gcc.dg/pr43864-3.c: Likewise.
* gcc.dg/pr43864-4.c: Likewise.
* gcc.dg/pr43864.c: Likewise.
* gcc.dg/tree-prof/cold_partition_label.c: Likewise.
* gcc.dg/tree-prof/indir-call-prof.c: Likewise.
* gcc.dg/tree-prof/update-cunroll-2.c: Likewise.
* gcc.dg/tree-prof/update-tailcall.c: Likewise.
* gcc.dg/tree-prof/val-prof-1.c: Likewise.
* gcc.dg/tree-prof/val-prof-2.c: Likewise.
* gcc.dg/tree-prof/val-prof-3.c: Likewise.
* gcc.dg/tree-prof/val-prof-4.c: Likewise.
* gcc.dg/tree-prof/val-prof-5.c: Likewise.
* gcc.dg/tree-ssa/fnsplit-1.c: Likewise.
* gcc.dg/tree-ssa/loop-ch-profile-2.c: Likewise.
* gcc.dg/tree-ssa/update-threading.c: Likewise.
* gcc.dg/tree-ssa/update-unswitch-1.c: Likewise.
* gcc.dg/unroll-7.c: Likewise.
* gcc.dg/unroll-8.c: Likewise.
* gfortran.dg/pr25623-2.f90: Likewise.
* gfortran.dg/pr25623.f90: Likewise.
* gcc.dg/tree-ssa/loop-ch-profile-1.c: Likewise; xfail.
* gcc.dg/tree-ssa/update-cunroll.c: Likewise; xfail.
* gcc.dg/tree-ssa/update-unroll-1.c: Likewise; xfail.

Fix epilogue loop profile

Fix two bugs in scale_loop_profile which crept in during my
cleanups and curiously enoug did not show on the testcases we have so far.
The patch also adds the missing call to cap iteration count of the vectorized
loop epilogues.

Vectorizer profile needs more work, but I am trying to chase out obvious bugs first
so the profile quality statistics become meaningful and we can try to improve on them.

Now we get:

Pass dump id and name            |static mismatcdynamic mismatch
                                 |in count     |in count
107t cunrolli                    |      3    +3|        17251       +17251
116t vrp                         |      5    +2|        30908       +16532
118t dce                         |      3    -2|        17251       -13657
127t ch                          |     13   +10|        17251
131t dom                         |     39   +26|        17251
133t isolate-paths               |     47    +8|        17251
134t reassoc                     |     49    +2|        17251
136t forwprop                    |     53    +4|       202501      +185250
159t cddce                       |     61    +8|       216211       +13710
161t ldist                       |     62    +1|       216211
172t ifcvt                       |     66    +4|       373711      +157500
173t vect                        |    143   +77|      9801947     +9428236
176t cunroll                     |    149    +6|     12006408     +2204461
183t loopdone                    |    146    -3|     11944469       -61939
195t fre                         |    142    -4|     11944469
197t dom                         |    141    -1|     13038435     +1093966
199t threadfull                  |    143    +2|     13246410      +207975
200t vrp                         |    145    +2|     13444579      +198169
204t dce                         |    143    -2|     13371315       -73264
206t sink                        |    141    -2|     13371315
211t cddce                       |    147    +6|     13372755        +1440
255t optimized                   |    145    -2|     13372755
256r expand                      |    141    -4|     13371197        -1558
258r into_cfglayout              |    139    -2|     13371197
275r loop2_unroll                |    143    +4|     16792056     +3420859
291r ce2                         |    141    -2|     16811462
312r pro_and_epilogue            |    161   +20|     16873400       +61938
315r jump2                       |    167    +6|     20910158     +4036758
323r bbro                        |    160    -7|     16559844     -4350314

Vect still introduces 77 profile mismatches (same as without this patch)
however subsequent cunroll works much better with 6 new mismatches compared to
78.  Overall it reduces 229 mismatches to 160.

Also overall runtime estimate is now reduced by 6.9%.
Previously the overall runtime estimate grew by 11% which was result of the fat
that the epilogue profile was pretty much the same as profile of the original
loop.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks
after exit.
* tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound
is known.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vect-profile-upate.c: New test.

IBM Z: Fix vec_init default expander

Do not reinitialize vector lanes to zero since they are already
initialized to zero.

gcc/ChangeLog:

* config/s390/s390.cc (vec_init): Fix default case

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-init-3.c: New test.

LRA: Refine reload pseudo class

For given testcase a reload pseudo happened to occur only in reload
insns created on one constraint sub-pass.  Therefore its initial class
(ALL_REGS) was not refined and the reload insns were not processed on
the next constraint sub-passes.  This resulted into the wrong insn.

        PR rtl-optimization/110372

gcc/ChangeLog:

* lra-assigns.cc (assign_by_spills): Add reload insns involving
reload pseudos with non-refined class to be processed on the next
sub-pass.
* lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
(in_class_p): Use it.
(print_curr_insn_alt): New func.
(process_alt_operands): Use it.  Improve debug info.
(curr_insn_transform): Use print_curr_insn_alt.  Refine reload
pseudo class if it is not refined yet.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110372.c: New.

A singleton irange has all known bits.

gcc/ChangeLog:

* value-range.cc (irange::get_bitmask_from_range): Return all the
known bits for a singleton.
(irange::set_range_from_bitmask): Set a range of a singleton when
all bits are known.

The caller to irange::intersect (wide_int, wide_int) must normalize the range.

Per the function comment, the caller to intersect(wide_int, wide_int)
must handle the mask. This means it must also normalize the range if
anything changed.

gcc/ChangeLog:

* value-range.cc (irange::intersect): Leave normalization to
caller.

Implement value/mask tracking for irange.

Integer ranges (irange) currently track known 0 bits.  We've wanted to
track known 1 bits for some time, and instead of tracking known 0 and
known 1's separately, it has been suggested we track a value/mask pair
similarly to what we do for CCP and RTL.  This patch implements such a
thing.

With this we now track a VALUE integer which are the known values, and
a MASK which tells us which bits contain meaningful information.  This
allows us to fix a handful of enhancement requests, such as PR107043
and PR107053.

There is a 4.48% performance penalty for VRP and 0.42% in overall
compilation for this entire patchset.  It is expected and in line
with the loss incurred when we started tracking known 0 bits.

This patch just provides the value/mask tracking support.  All the
nonzero users (range-op, IPA, CCP, etc), are still using the nonzero
nomenclature.  For that matter, this patch reimplements the nonzero
accessors with the value/mask functionality.  In follow-up patches I
will enhance these passes to use the value/mask information, and
fix the aforementioned PRs.

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_value_range): Adjust for
value/mask.
* data-streamer-out.cc (streamer_write_vrange): Same.
* range-op.cc (operator_cast::fold_range): Same.
* value-range-pretty-print.cc
(vrange_printer::print_irange_bitmasks): Same.
* value-range-storage.cc (irange_storage::write_lengths_address):
Same.
(irange_storage::set_irange): Same.
(irange_storage::get_irange): Same.
(irange_storage::size): Same.
(irange_storage::dump): Same.
* value-range-storage.h: Same.
* value-range.cc (debug): New.
(irange_bitmask::dump): New.
(add_vrange): Adjust for value/mask.
(irange::operator=): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::operator==): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(irange::get_nonzero_bits_from_range): Rename to...
(irange::get_bitmask_from_range): ...this.
(irange::set_range_from_nonzero_bits): Rename to...
(irange::set_range_from_bitmask): ...this.
(irange::set_nonzero_bits): Rename to...
(irange::update_bitmask): ...this.
(irange::get_nonzero_bits): Rename to...
(irange::get_bitmask): ...this.
(irange::intersect_nonzero_bits): Rename to...
(irange::intersect_bitmask): ...this.
(irange::union_nonzero_bits): Rename to...
(irange::union_bitmask): ...this.
(irange_bitmask::verify_mask): New.
* value-range.h (class irange_bitmask): New.
(irange_bitmask::set_unknown): New.
(irange_bitmask::unknown_p): New.
(irange_bitmask::irange_bitmask): New.
(irange_bitmask::get_precision): New.
(irange_bitmask::get_nonzero_bits): New.
(irange_bitmask::set_nonzero_bits): New.
(irange_bitmask::operator==): New.
(irange_bitmask::union_): New.
(irange_bitmask::intersect): New.
(class irange): Friend vrange_printer.
(irange::varying_compatible_p): Adjust for bitmask.
(irange::set_varying): Same.
(irange::set_nonzero): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107009.c: Adjust irange dumping for
value/mask changes.
* gcc.dg/tree-ssa/vrp-unreachable.c: Same.
* gcc.dg/tree-ssa/vrp122.c: Same.

x86: slightly correct / simplify *vec_extractv2ti

V2TImode values cannot appear in the upper 16 YMM registers without
AVX512VL being enabled. Therefore forcing 512-bit mode (also not
reflected in the "mode" attribute) is pointless.

gcc/

* config/i386/sse.md (*vec_extractv2ti): Drop g modifiers.

x86: correct / simplify @vec_extract_hi_<mode> and vec_extract_hi_v32qi

The middle alternative each was unusable without enabling AVX512DQ (in
addition to AVX512VL), which is entirely unrelated here. The last
alternative is usable with AVX512VL only (due to type restrictions on
what may be put in the upper 16 YMM registers), and hence is pointlessly
forcing 512-bit mode (without actually reflecting that in the "mode"
attribute).

gcc/

* config/i386/sse.md (@vec_extract_hi_<mode>): Drop last
alternative. Switch new last alternative's "isa" attribute to
"avx512vl".
(vec_extract_hi_v32qi): Likewise.

Closing the GCC 10 branch

contrib/
* gcc-changelog/git_update_version.py: Remove GCC 10 from
active_refs.

maintainer-scripts/
* crontab: Remove entry for GCC 10.

RISC-V: Fix one bug for floating-point static frm

This patch would like to fix one bug to align below items of spec.

RVV floating-point instructions always (implicitly) use the dynamic
rounding mode. This implies that rounding is performed according to the
rounding mode set in the FRM register. The FRM register itself
only holds proper rounding modes and never the dynamic rounding mode.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Robin Dapp <rdapp@ventanamicro.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_mode_set): Avoid emit insn
when FRM_MODE_DYN.
(riscv_mode_entry): Take FRM_MODE_DYN as entry mode.
(riscv_mode_exit): Likewise for exit mode.
(riscv_mode_needed): Likewise for needed mode.
(riscv_mode_after): Likewise for after mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: New test.

RISC-V: Fix one typo of FRM dynamic definition

This patch would like to fix one typo that take rdn instead of dyn by
mistake.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector.md: Fix typo.

Daily bump.

libstdc++: Fix fwrite error parameter

The first parameter of fwrite should be the const char* __s which want
write to FILE *__file, rather than the FILE *__file write to the FILE
*__file.

libstdc++-v3/ChangeLog:

* config/io/basic_file_stdio.cc (xwrite) [USE_STDIO_PURE]: Fix
first argument.

Improve profile updates after loop-ch and cunroll

Extend loop-ch and loop unrolling to fix profile in case the loop is
known to not iterate at all (or iterate few times) while profile claims it
iterates more.  While this is kind of symptomatic fix, it is best we can do
incase profile was originally esitmated incorrectly.

In the testcase the problematic loop is produced by vectorizer and I think
vectorizer should know and account into its costs that vectorizer loop and/or
epilogue is not going to loop after the transformation.  So it would be nice
to fix it on that side, too.

The patch avoids about half of profile mismatches caused by cunroll.

Pass dump id and name            |static mismatcdynamic mismatch
                                 |in count     |in count
107t cunrolli                    |      3    +3|        17251   +17251
115t threadfull                  |      3      |        14376    -2875
116t vrp                         |      5    +2|        30908   +16532
117t dse                         |      5      |        30908
118t dce                         |      3    -2|        17251   -13657
127t ch                          |     13   +10|        17251
131t dom                         |     39   +26|        17251
133t isolate-paths               |     47    +8|        17251
134t reassoc                     |     49    +2|        17251
136t forwprop                    |     53    +4|       202501  +185250
159t cddce                       |     61    +8|       216211   +13710
161t ldist                       |     62    +1|       216211
172t ifcvt                       |     66    +4|       373711  +157500
173t vect                        |    143   +77|      9802097 +9428386
176t cunroll                     |    221   +78|     15639591 +5837494
183t loopdone                    |    218    -3|     15577640   -61951
195t fre                         |    214    -4|     15577640
197t dom                         |    213    -1|     16671606 +1093966
199t threadfull                  |    215    +2|     16879581  +207975
200t vrp                         |    217    +2|     17077750  +198169
204t dce                         |    215    -2|     17004486   -73264
206t sink                        |    213    -2|     17004486
211t cddce                       |    219    +6|     17005926    +1440
255t optimized                   |    217    -2|     17005926
256r expand                      |    210    -7|     19571573 +2565647
258r into_cfglayout              |    208    -2|     19571573
275r loop2_unroll                |    212    +4|     22992432 +3420859
291r ce2                         |    210    -2|     23011838
312r pro_and_epilogue            |    230   +20|     23073776   +61938
315r jump2                       |    236    +6|     27110534 +4036758
323r bbro                        |    229    -7|     21826835 -5283699

W/o the patch cunroll does:

176t cunroll                     |    294  +151|126548439   +116746342

and we end up with 291 mismatches at bbro.

Bootstrapped/regtested x86_64-linux. Plan to commit it after the scale_loop_frequency patch.

gcc/ChangeLog:

PR middle-end/25623
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency to maximal number
of iterations determined.
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise.

gcc/testsuite/ChangeLog:

PR middle-end/25623
* gfortran.dg/pr25623-2.f90: New test.

Improve scale_loop_profile

Original scale_loop_profile was implemented to only handle very simple loops
produced by vectorizer at that time (basically loops with only one exit and no
subloops). It also has not been updated to new profile-count API very carefully.

The function does two thigs
1) scales down the loop profile by a given probability.
    This is useful, for example, to scale down profile after peeling when loop
    body is executed less often than before
2) update profile to cap iteration count by ITERATION_BOUND parameter.

I changed ITERATION_BOUND to be actual bound on number of iterations as
used elsewhere (i.e. number of executions of latch edge) rather then
number of iterations + 1 as it was before.

To do 2) one needs to do the following
  a) scale own loop profile so frquency o header is at most
     the sum of in-edge counts * (iteration_bound + 1)
  b) update loop exit probabilities so their count is the same
     as before scaling.
  c) reduce frequencies of basic blocks after loop exit

old code did b) by setting probability to 1 / iteration_bound which is
correctly only of the basic block containing exit executes precisely one per
iteration (it is not insie other conditional or inner loop).  This is fixed
now by using set_edge_probability_and_rescale_others

aldo c) was implemented only for special case when the exit was just before
latch bacis block.  I now use dominance info to get right some of addional
case.

I still did not try to do anything for multiple exit loops, though the
implementatoin could be generalized.

Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
are no complains.

gcc/ChangeLog:

* cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
probability update to be safe on loops with subloops.
Make bound parameter to be iteration bound.
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
of scale_loop_profile.
* tree-vect-loop-manip.cc (vect_do_peeling): Likewise.

Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

If a loop is unrolled by n times during vectoriation, two steps are used to
calculate the induction variable:
  - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
  - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)

This patch calculates an extra vec_n to replace vec_loop:
  vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.

So that we can save the large step register and related operations.

gcc/ChangeLog:

PR tree-optimization/110449
* tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
vec_loop for the unrolled loop.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110449.c: New testcase.

libstdc++: Document --enable-cstdio=stdio_pure [PR110574]

libstdc++-v3/ChangeLog:

PR libstdc++/110574
* doc/xml/manual/configure.xml: Describe stdio_pure argument to
--enable-cstdio.
* doc/html/manual/configure.html: Regenerate.

updat_bb_profile_for_threading TLC

Apply some TLC to update_bb_profile_for_threading.  The function resales
probabilities by:
       FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;
which is correct but in case prob is 0 (took all execution counts to the newly
constructed path), this leads to undefined results which do not sum to 100%.

In several other plpaces we need to change probability of one edge and rescale
remaining to sum to 100% so I decided to break this off to helper function
set_edge_probability_and_rescale_others

For jump threading the probability of edge is always reduced, so division is right
update, however in general case we also may want to increase probability of the edge
which needs different scalling.  This is bit hard to do staying with probabilities
in range 0...1 for all temporaries.

For this reason I decided to add profile_probability::apply_scale which is symmetric
to what we already have in profile_count::apply_scale and does right thing in
both directions.

Finally I added few early exits so we do not produce confused dumps when
profile is missing and special case the common situation where edges out of BB
are precisely two.  In this case we can set the other edge to inverter probability
which. Saling drop probability quality from PRECISE to ADJUSTED.

Bootstrapped/regtested x86_64-linux. The patch has no effect on in count mismatches
in tramp3d build and improves out-count.  Will commit it shortly.

gcc/ChangeLog:

* cfg.cc (set_edge_probability_and_rescale_others): New function.
(update_bb_profile_for_threading): Use it; simplify the rest.
* cfg.h (set_edge_probability_and_rescale_others): Declare.
* profile-count.h (profile_probability::apply_scale): New.

arc: Update builtin documentation

gcc/ChangeLog:
* doc/extend.texi (ARC Built-in Functions): Update documentation
with missing builtins.

tree-optimization/110556 - tail merging still pre-tuples

The stmt comparison function for GIMPLE_ASSIGNs for tail merging
still looks like it deals with pre-tuples IL. The following
attempts to fix this, not only comparing the first operand (sic!)
of stmts but all of them plus also compare the operation code.

PR tree-optimization/110556
* tree-ssa-tail-merge.cc (gimple_equal_p): Check
assign code and all operands of non-stores.

* gcc.dg/torture/pr110556.c: New testcase.

ada: Add specification source files of runtime units

gcc/ada/

* gcc-interface/Make-lang.in: Add object files of specification
files.

ada: Refactor the proof of the Value and Image runtime units

The aim of this refactoring is to avoid unnecessary dependencies
between Image and Value units even though they share the same
specification functions. These functions are grouped inside ghost
packages which are then withed by Image and Value units.

gcc/ada/

* libgnat/s-vs_int.ads: Instance of Value_I_Spec for Integer.
* libgnat/s-vs_lli.ads: Instance of Value_I_Spec for
Long_Long_Integer.
* libgnat/s-vsllli.ads: Instance of Value_I_Spec for
Long_Long_Long_Integer.
* libgnat/s-vs_uns.ads: Instance of Value_U_Spec for Unsigned.
* libgnat/s-vs_llu.ads: Instance of Value_U_Spec for
Long_Long_Unsigned.
* libgnat/s-vslllu.ads: Instance of Value_U_Spec for
Long_Long_Long_Unsigned.
* libgnat/s-imagei.ads: Take instances of Value_*_Spec as
parameters.
* libgnat/s-imagei.adb: Idem.
* libgnat/s-imageu.ads: Idem.
* libgnat/s-imageu.adb: Idem.
* libgnat/s-valuei.ads: Idem.
* libgnat/s-valuei.adb: Idem.
* libgnat/s-valueu.ads: Idem.
* libgnat/s-valueu.adb: Idem.
* libgnat/s-imgint.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imguns.ads: Adapt instance to new ghost parameters.
* libgnat/s-valint.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallli.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valuns.ads: Adapt instance to new ghost parameters.
* libgnat/s-vaispe.ads: Take instance of Value_U_Spec as parameter
and remove unused declaration.
* libgnat/s-vaispe.adb: Idem.
* libgnat/s-vauspe.ads: Remove unused declaration.
* libgnat/s-valspe.ads: Factor out the specification part of
Val_Util.
* libgnat/s-valspe.adb: Idem.
* libgnat/s-valuti.ads: Move specification to Val_Spec.
* libgnat/s-valuti.adb: Idem.
* libgnat/s-valboo.ads: Use Val_Spec.
* libgnat/s-valboo.adb: Idem.
* libgnat/s-imgboo.adb: Idem.
* libgnat/s-imagef.adb: Adapt instances to new ghost parameters.
* Makefile.rtl: List new files.

ada: Evaluate static expressions in Range attributes

Gigi assumes that the value of range expressions is an integer literal.
Force evaluation of such expressions since static non-literal expressions
are not always evaluated to a literal form by gnat.

gcc/ada/

* sem_attr.adb (analyze_attribute.check_array_type): Replace valid
indexes with their staticly evaluated values.

ada: Refer to non-Ada binding limitations in user guide

The limitation of resetting the FPU mode for non 80-bit
precision was not referenced from "Creating a Stand-alone
Library to be used in a non-Ada context". Reference it the same
way it is already referenced from "Interfacing to C".

gcc/ada/

* doc/gnat_ugn/the_gnat_compilation_model.rst: Reference "Binding
with Non-Ada Main Programs" from "Creating a Stand-alone Library
to be used in a non-Ada context".
* gnat_ugn.texi: Regenerate.

ada: Reuse code in Is_Fully_Initialized_Type

gcc/ada/

* sem_util.adb (Is_Fully_Initialized_Type): Avoid recalculating
the underlying type twice.

ada: Avoid crash in Find_Optional_Prim_Op

Find_Optional_Prim_Op can crash when the Underlying_Type is Empty.
This can happen when you are dealing with a structure type with a
private part that does not have its Full_View set yet.

gcc/ada/

* exp_util.adb (Find_Optional_Prim_Op): Stop deriving primitive
operation if there is no underlying type to derive it from.

ada: Improve error message on violation of SPARK_Mode rules

SPARK_Mode On can only be used on library-level entities.
Improve the error message here.

gcc/ada/

* errout.ads: Add explain code.
* sem_prag.adb (Check_Library_Level_Entity): Refine error message
and add explain code.

ada: Finalization not performed for component of protected type

In some cases involving a discriminated protected type with an array
component that is subject to a discriminant-dependent index constraint,
where the element type of the array requires finalization and the array
type has not yet been frozen at the point of the declaration of the protected
type, finalization of an object of the protected type may incorrectly omit
finalization of the array component. One case where this scenario can arise
is an instantiation of Ada.Containers.Bounded_Synchronized_Queues, passing in
an Element type that requires finalization.

gcc/ada/

* exp_ch7.adb (Make_Final_Call): Add assertion that if no
finalization call is generated, then the type of the object being
finalized does not require finalization.
* freeze.adb (Freeze_Entity): If freezing an already-frozen
subtype, do not assume that nothing needs to be done. In the case
of a frozen subtype of a non-frozen type or subtype (which is
possible), freeze the non-frozen entity.

tree-optimization/110563 - simplify epilogue VF checks

The following consolidates an assert that now hits for ppc64le
with an earlier check we already do, simplifying
vect_determine_partial_vectors_and_peeling and getting rid of
its now redundant argument.

PR tree-optimization/110563
* tree-vectorizer.h (vect_determine_partial_vectors_and_peeling):
Remove second argument.
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Remove for_epilogue_p argument. Merge assert ...
(vect_analyze_loop_2): ... with check done before determining
partial vectors by moving it after.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust.

GGC, GTY: Tighten up a few things re 'reorder' option and strings

..., which doesn't make sense in combination.

This, again, is primarily preparational for another change.

gcc/
* ggc-common.cc (gt_pch_note_reorder, gt_pch_save): Tighten up a
few things re 'reorder' option and strings.
* stringpool.cc (gt_pch_p_S): This is now 'gcc_unreachable'.

GTY: Clean up obsolete parametrized structs remnants

Support removed in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".

gcc/
* gengtype-parse.cc: Clean up obsolete parametrized structs
remnants.
* gengtype.cc: Likewise.
* gengtype.h: Likewise.

GTY: Clean up obsolete 'bool needs_cast_p' field of 'gcc/gengtype.cc:struct walk_type_data'

Last use disappeared in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".

gcc/
* gengtype.cc (struct walk_type_data): Remove 'needs_cast_p'.
Adjust all users.