git.ipfire.org Git - thirdparty/gcc.git/log

c++: mangling tweaks

Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.

The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.

The comment documents existing semantics.

gcc/cp/ChangeLog:

* mangle.cc (abi_check): New.
(write_prefix, write_unqualified_name, write_discriminator)
(write_type, write_member_name, write_expression)
(write_template_arg, write_template_param): Use it.
(start_mangling): Assign from {}.
* cp-tree.h: Update comment.

c++: Add missing auto_diagnostic_groups to constexpr.cc

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing
auto_diagnostic_group.
(cxx_eval_call_expression): Likewise.
(diag_array_subscript): Likewise.
(outside_lifetime_error): Likewise.
(potential_constant_expression_1): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>

RISC-V/testsuite/pr111466.c: update test and expected output

Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

c: error for function with external and internal linkage [PR111708]

Declaring a function with both external and internal linkage
in the same TU is translation-time UB. Add an error for this
case as already done for objects.

PR c/111708

gcc/c/ChangeLog:

* c-decl.cc (grokdeclarator): Add error.

gcc/testsuite/ChangeLog:

* gcc.dg/pr111708-1.c: New test.
* gcc.dg/pr111708-2.c: New test.

Fortran: out of bounds access with nested implied-do IO [PR111837]

gcc/fortran/ChangeLog:

PR fortran/111837
* frontend-passes.cc (traverse_io_block): Dependency check of loop
nest shall be triangular, not banded.

gcc/testsuite/ChangeLog:

PR fortran/111837
* gfortran.dg/implied_do_io_8.f90: New test.

fortran/intrinsic.texi: Improve SIGNAL intrinsic entry

gcc/fortran/ChangeLog:

* intrinsic.texi (signal): Mention that the argument
passed to the signal handler procedure is passed by reference.
Extend example.

MATCH: [PR111432] Simplify `a & (x | CST)` to a when we know that (a & ~CST) == 0

This adds the simplification `a & (x | CST)` to a when we know that
`(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle.

I looked into handling `a | (x & CST)` but that I don't see any decent
simplifications happening.

OK? Bootstrapped and tested on x86_linux-gnu with no regressions.

PR tree-optimization/111432

gcc/ChangeLog:

* match.pd (`a & (x | CST)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-7.c: New test.

LibF7: Re-generate f7-renames.h to pick up white-space from f7renames.sh.

libgcc/config/avr/libf7/
* f7-renames.h: Re-renerate.

tree-cfg: Add count information when creating new bb in move_sese_region_to_fn

This patch makes sure the profile_count information is initialized for the new
bb created in move_sese_region_to_fn.

gcc/ChangeLog:

* tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for
new basic block.

PR modula2/111756: Re-building all-gcc after source changes fails to link

When having modula-2 enabled in a development tree and there are any
changes that trigger rebuilds in m2/ doing a 'make all-gcc' in the
build directory might fail due to lack of dependency tracking.  This
patch introduces build dependencies into gcc/m2/Make-lang.in using -M*
options.  The patch also introduces all -M* options to cc1gm2 and gm2.

gcc/m2/ChangeLog:

PR modula2/111756
* Make-lang.in (CM2DEP): New define conditionally set if
($(CXXDEPMODE),depmode=gcc3).
(GM2_1): Use $(CM2DEP).
(m2/gm2-gcc/%.o): Ensure $(@D)/$(DEPDIR) is created.
Add $(CM2DEP) to the $(COMPILER) command and use $(POSTCOMPILE).
(m2/gm2-gcc/m2configure.o): Ditto.
(m2/gm2-lang.o): Ditto.
(m2/m2pp.o): Ditto.
(m2/gm2-gcc/rtegraph.o): Ditto.
(m2/mc-boot/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot-ch/$(SRC_PREFIX)%.o): Ditto.
(m2/mc-boot/main.o): Ditto.
(mcflex.o): Ditto.
(m2/gm2-libs-boot/M2RTS.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/%.o): Ditto.
(m2/gm2-libs-boot/RTcodummy.o): Ditto.
(m2/gm2-libs-boot/RTintdummy.o): Ditto.
(m2/gm2-libs-boot/wrapc.o): Ditto.
(m2/gm2-libs-boot/UnixArgs.o): Ditto.
(m2/gm2-libs-boot/choosetemp.o): Ditto.
(m2/gm2-libs-boot/errno.o): Ditto.
(m2/gm2-libs-boot/dtoa.o): Ditto.
(m2/gm2-libs-boot/ldtoa.o): Ditto.
(m2/gm2-libs-boot/termios.o): Ditto.
(m2/gm2-libs-boot/SysExceptions.o): Ditto.
(m2/gm2-libs-boot/SysStorage.o): Ditto.
(m2/gm2-compiler-boot/M2GCCDeclare.o): Ditto.
(m2/gm2-compiler-boot/M2Error.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/%.o): Ditto.
(m2/gm2-compiler-boot/m2flex.o): Ditto.
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-compiler/m2flex.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
(m2/gm2-libs/choosetemp.o): Ditto.
(m2/boot-bin/mklink$(exeext)): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/pge-boot/%.o): Ditto.
(m2/gm2-compiler/%.o): Ensure $(@D)/$(DEPDIR) is created and use
$(POSTCOMPILE).
(m2/gm2-compiler/%.o): Ditto.
(m2/gm2-libs-iso/%.o): Ditto.
(m2/gm2-libs/%.o): Ditto.
* README: Purge out of date info.
* gm2-compiler/M2Comp.mod (MakeSaveTempsFileNameExt): Import.
(OnExitDelete): Import.
(GetModuleDefImportStatementList): Import.
(GetModuleModImportStatementList): Import.
(GetImportModule): Import.
(IsImportStatement): Import.
(IsImport): Import.
(GetImportStatementList): Import.
(File): Import.
(Close): Import.
(EOF): Import.
(IsNoError): Import.
(WriteLine): Import.
(WriteChar): Import.
(FlushOutErr): Import.
(WriteS): Import.
(OpenToRead): Import.
(OpenToWrite): Import.
(ReadS): Import.
(WriteS): Import.
(GetM): Import.
(GetMM): Import.
(GetDepTarget): Import.
(GetMF): Import.
(GetMP): Import.
(GetObj): Import.
(GetMD): Import.
(GetMMD): Import.
(GenerateDefDependency): New procedure.
(GenerateDependenciesFromImport): New procedure.
(GenerateDependenciesFromList): New procedure.
(GenerateDependencies): New procedure.
(Compile): Re-write.
(compile): Re-format.
(CreateFileStem): New procedure function.
(DoPass0): Re-write.
(IsLibrary): New procedure function.
(IsUnique): New procedure function.
(Append): New procedure.
(MergeDep): New procedure.
(GetRuleTarget): New procedure function.
(ReadDepContents): New procedure function.
(WriteDep): New procedure.
(WritePhonyDep): New procedure.
(WriteDepContents): New procedure.
(CreateDepFilename): New procedure function.
(Pass0CheckDef): New procedure function.
(Pass0CheckMod): New procedure function.
(DoPass0): Re-write.
(DepContent): New variable.
(DepOutput): New variable.
(BaseName): New procedure function.
* gm2-compiler/M2GCCDeclare.mod (PrintTerse): Handle IsImport.
Replace IsGnuAsmVolatile with IsGnuAsm.
* gm2-compiler/M2Options.def (EXPORT QUALIFIED): Remove list.
(SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Options.mod (SetM): New procedure.
(GetM): New procedure function.
(SetMM): New procedure.
(GetMM): New procedure function.
(SetMF): New procedure.
(GetMF): New procedure function.
(SetPPOnly): New procedure.
(GetB): New procedure function.
(SetMD): New procedure.
(GetMD): New procedure function.
(SetMMD): New procedure.
(GetMMD): New procedure function.
(SetMQ): New procedure.
(SetMT): New procedure.
(GetMT): New procedure function.
(GetDepTarget): New procedure function.
(SetMP): New procedure.
(GetMP): New procedure function.
(SetObj): New procedure.
(SetSaveTempsDir): New procedure.
* gm2-compiler/M2Preprocess.def (PreprocessModule): New parameters
topSource and outputDep.  Re-write.
(MakeSaveTempsFileNameExt): New procedure function.
(OnExitDelete): New procedure function.
* gm2-compiler/M2Preprocess.mod (GetM): Import.
(GetMM): Import.
(OnExitDelete): Add debugging message.
(RemoveFile): Add debugging message.
(BaseName): Remove.
(BuildCommandLineExecute): New procedure function.
* gm2-compiler/M2Search.def (SetDefExtension): Remove unnecessary
spacing.
* gm2-compiler/SymbolTable.mod (GetSymName): Handle ImportSym and
ImportStatementSym.
* gm2-gcc/m2options.h (M2Options_SetMD): New function.
(M2Options_GetMD): New function.
(M2Options_SetMMD): New function.
(M2Options_GetMMD): New function.
(M2Options_SetM): New function.
(M2Options_GetM): New function.
(M2Options_SetMM): New function.
(M2Options_GetMM): New function.
(M2Options_GetMQ): New function.
(M2Options_SetMF): New function.
(M2Options_GetMF): New function.
(M2Options_SetMT): New function.
(M2Options_SetMP): New function.
(M2Options_GetMP): New function.
(M2Options_GetDepTarget): New function.
* gm2-lang.cc (gm2_langhook_init): Correct comment case.
(gm2_langhook_init_options): Add case OPT_M and
OPT_MM.
(gm2_langhook_post_options): Add case OPT_MF, OPT_MT,
OPT_MD and OPT_MMD.
* lang-specs.h (M2CPP): Pass though MF option.
(MDMMD): New define.  Add MDMMD to "@modula-2".

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

tree-optimization/111846 - put simd-clone-info into SLP tree

The following avoids bogously re-using the simd-clone-info we
currently hang off stmt_info from two different SLP contexts where
a different number of lanes should have chosen a different best
simdclone.

PR tree-optimization/111846
* tree-vectorizer.h (_slp_tree::simd_clone_info): Add.
(SLP_TREE_SIMD_CLONE_INFO): New.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_SIMD_CLONE_INFO.
(_slp_tree::~_slp_tree): Release it.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO
dependent on if we're doing SLP.

* gcc.dg/vect/pr111846.c: New testcase.

wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}

The following patch implements printing of wide_int/widest_int numbers
decimally when asked for that using print_dec{,s,u}, even if they have
precision larger than 64 and get_len () above 1 (right now we printed
them hexadecimally and even negative numbers as huge positive hexadecimal).

In order to avoid the expensive division/modulo by 10^19 twice, once to
estimate how many will be needed and another to actually print it, the
patch prints the 19 digit chunks in reverse order (from least significant
to most significant) and then reorders those with linear complexity to form
the right printed number.
Tested with printing both 256 and 320 bit numbers (first as an example
of even number of 19 digit chunks plus one shorter above it, the second
as an example of odd number of 19 digit chunks plus one shorter above it).

The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now
is one byte too much (one byte for -, one for '\0') and too conservative,
so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g.
l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative
estimation (though more expensive to compute in inline code).
But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much
as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come
from?  Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple
of 64...

2023-10-17  Jakub Jelinek  <jakub@redhat.com>

* wide-int-print.h (print_dec_buf_size): For length, divide number
of bits by 3 and add 3 instead of division by 4 and adding 4.
* wide-int-print.cc (print_decs): Remove superfluous ()s.  Don't call
print_hex, instead call print_decu on either negated value after
printing - or on wi itself.
(print_decu): Don't call print_hex, instead print even large numbers
decimally.
(pp_wide_int_large): Assume len from print_dec_buf_size is big enough
even if it returns false.
* pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if
pp_wide_int_large should be used.
* tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size
to compute needed buffer size.

RISC-V: Fix failed testcase when use -cmodel=medany

This little path fix a failed testcase when use -cmodel=medany.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-1.c: Split check.

LibF7: Implement fma / fmal.

libgcc/config/avr/libf7/
* libf7.h (F7_SIZEOF): New macro.
* libf7-asm.sx: Use F7_SIZEOF instead of magic number "10".
(F7MOD_D_fma_, __fma): New module and function.
(fma) [-mdouble=64]: Define as alias for __fma.
(fmal) [-mlong-double=64]: Define as alias for __fma.
* libf7-common.mk (F7_ASM_PARTS): Add D_fma.

middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile

The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.

The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles. The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.

PR middle-end/111818
* tree-ssa.cc (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.

* gcc.dg/torture/pr111818.c: New testcase.

tree-optimization/111807 - ICE in verify_sra_access_forest

The following addresses build_reconstructed_reference failing to
build references with a different offset than the models and thus
the caller conditional being off. This manifests when attempting
to build a ref with offset 160 from the model BIT_FIELD_REF <l_4827[9], 8, 0>
onto the same base l_4827 but the models offset being 288. This
cannot work for any kind of ref I can think of, not just with
BIT_FIELD_REFs.

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Only call
build_reconstructed_reference when the offsets are the same.

* gcc.dg/torture/pr111807.c: New testcase.

expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
   5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|                               | # of unexpected case / # of unique unexpected case
|                               |          gcc |          g++ |     gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 /     2 |   72 /    12 |
|    lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this 🙂

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming 😉

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w a6,a1             <-- this goes away
beq a1,zero,.L4
li a5,0
li a0,0
.L3:
addw a4,a2,a5
addw a5,a3,a5
addw a0,a4,a0
bltu a5,a6,.L3
ret
.L4:
li a0,0
ret
```

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
PR target/111466
gcc/
* expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P.

gcc/testsuite
* gcc.target/riscv/pr111466.c: New test.

LoongArch: Fix vec_initv32qiv16qi template to avoid ICE.

Following test code triggers unrecognized insn ICE on LoongArch target
with "-O3 -mlasx":

void
foo (unsigned char *dst, unsigned char *src)
{
  for (int y = 0; y < 16; y++)
    {
      for (int x = 0; x < 16; x++)
        dst[x] = src[x] + 1;
      dst += 32;
      src += 32;
    }
}

ICE info:
./test.c: In function ‘foo’:
./test.c:8:1: error: unrecognizable insn:
    8 | }
      | ^
(insn 15 14 16 4 (set (reg:V32QI 185 [ vect__24.7 ])
        (vec_concat:V32QI (reg:V16QI 186)
            (const_vector:V16QI [
                    (const_int 0 [0]) repeated x16
                ]))) "./test.c":4:19 -1
     (nil))
during RTL pass: vregs
./test.c:8:1: internal compiler error: in extract_insn, at recog.cc:2791
0x12028023b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
        /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:108
0x12028026f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:116
0x120a03c5b extract_insn(rtx_insn*)
        /home/panchenghui/upstream/gcc/gcc/recog.cc:2791
0x12067ff73 instantiate_virtual_regs_in_insn
        /home/panchenghui/upstream/gcc/gcc/function.cc:1610
0x12067ff73 instantiate_virtual_regs
        /home/panchenghui/upstream/gcc/gcc/function.cc:1983
0x12067ff73 execute
        /home/panchenghui/upstream/gcc/gcc/function.cc:2030

This RTL is generated inside loongarch_expand_vector_group_init function (related
to vec_initv32qiv16qi template). Original impl doesn't ensure all vec_concat arguments
are register type. This patch adds force_reg() to the vec_concat argument generation.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
fix impl related to vec_initv32qiv16qi template to avoid ICE.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c: New test.

LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing relax,
it will cause the conditional branch to go out of bounds during the assembly process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu <xuchenghua@loongson.cn>

RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
    {
      sum1 += x[2*i] - a;
      sum1 += x[2*i+1] * b;
      sum2 += x[2*i] - b;
      sum2 += x[2*i+1] * a;
    }
  return sum1 + sum2;
}

Before this patch:

bar:
        ble     a3,zero,.L5
        csrr    t0,vlenb
        csrr    a6,vlenb
        slli    t1,t0,3
        vsetvli a5,zero,e32,m4,ta,ma
        sub     sp,sp,t1
        vid.v   v20
        vmv.v.x v12,a1
        vand.vi v4,v20,1
        vmv.v.x v16,a2
        vmseq.vi        v4,v4,1
        slli    t3,a6,2
        vsetvli zero,a5,e32,m4,ta,ma
        vmv1r.v v0,v4
        viota.m v8,v4
        add     a7,t3,sp
        vsetvli a5,zero,e32,m4,ta,mu
        vand.vi v28,v20,-2
        vadd.vi v4,v28,1
        vs4r.v  v20,0(a7)                        -----  spill
        vrgather.vv     v24,v12,v8
        vrgather.vv     v20,v16,v8
        vrgather.vv     v24,v16,v8,v0.t
        vrgather.vv     v20,v12,v8,v0.t
        vs4r.v  v4,0(sp)                          ----- spill
        slli    a3,a3,1
        addi    t4,a6,-1
        neg     t1,a6
        vmv4r.v v0,v20
        vmv.v.i v4,0
        j       .L4
.L13:
        vsetvli a5,zero,e32,m4,ta,ma
.L4:
        mv      a7,a3
        mv      a4,a3
        bleu    a3,a6,.L3
        csrr    a4,vlenb
.L3:
        vmv.v.x v8,t4
        vl4re32.v       v12,0(sp)                ---- spill
        vand.vv v20,v28,v8
        vand.vv v8,v12,v8
        vsetvli zero,a4,e32,m4,ta,ma
        vle32.v v16,0(a0)
        vsetvli a5,zero,e32,m4,ta,ma
        add     a3,a3,t1
        vrgather.vv     v12,v16,v20
        add     a0,a0,t3
        vrgather.vv     v20,v16,v8
        vsub.vv v12,v12,v0
        vsetvli zero,a4,e32,m4,tu,ma
        vadd.vv v4,v4,v12
        vmacc.vv        v4,v24,v20
        bgtu    a7,a6,.L13
        csrr    a1,vlenb
        slli    a1,a1,2
        add     a1,a1,sp
        li      a4,-1
        csrr    t0,vlenb
        vsetvli a5,zero,e32,m4,ta,ma
        vl4re32.v       v12,0(a1)               ---- spill
        vmv.v.i v8,0
        vmul.vx v0,v12,a4
        li      a2,0
        slli    t1,t0,3
        vadd.vi v0,v0,-1
        vand.vi v0,v0,1
        vmseq.vv        v0,v0,v8
        vand.vi v12,v12,1
        vmerge.vvm      v16,v8,v4,v0
        vmseq.vv        v12,v12,v8
        vmv.s.x v1,a2
        vmv1r.v v0,v12
        vredsum.vs      v16,v16,v1
        vmerge.vvm      v8,v8,v4,v0
        vmv.x.s a0,v16
        vredsum.vs      v8,v8,v1
        vmv.x.s a5,v8
        add     sp,sp,t1
        addw    a0,a0,a5
        jr      ra
.L5:
        li      a0,0
        ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
      && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
    {
               ...
            }

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrr a6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srli a2,a6,1
vmv.v.x v4,a1
vid.v v12
slli a3,a3,1
vand.vi v0,v12,1
addi t1,a2,-1
vmseq.vi v0,v0,1
slli a6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minu a4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vv v2,v16,v6
bgtu a4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li a3,-1
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v6,v4,v2,v0
vredsum.vs v6,v6,v1
vmul.vx v0,v12,a3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmv.x.s a4,v6
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v4,v4,v2,v0
vredsum.vs v4,v4,v1
vmv.x.s a0,v4
addw a0,a0,a4
ret
.L4:
li a0,0
ret

No spillings.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix big LMUL issue.
(get_store_value): New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test.

d: Forbid taking the address of an intrinsic with no implementation

This code fails to link:

    import core.math;
    real function(real) fn = &sin;

However, when called directly, the D intrinsic `sin()' is expanded by
the front-end into the GCC built-in `__builtin_sin()'.  This has been
fixed to now also expand the function when a reference is taken.

As there are D intrinsics and GCC built-ins that don't have a fallback
implementation, raise an error if taking the address is not possible.

gcc/d/ChangeLog:

* d-tree.h (intrinsic_code): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New prototype.
* expr.cc (ExprVisitor::visit (SymOffExp *)): Call
maybe_reject_intrinsic.
* intrinsics.cc (intrinsic_decl): Add fallback field.
(intrinsic_decls): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New function.
* intrinsics.def (DEF_D_LIB_BUILTIN): Update.
(DEF_CTFE_BUILTIN): Update.
(INTRINSIC_BSF): Declare as library builtin.
(INTRINSIC_BSR): Likewise.
(INTRINSIC_BT): Likewise.
(INTRINSIC_BSF64): Likewise.
(INTRINSIC_BSR64): Likewise.
(INTRINSIC_BT64): Likewise.
(INTRINSIC_POPCNT32): Likewise.
(INTRINSIC_POPCNT64): Likewise.
(INTRINSIC_ROL): Likewise.
(INTRINSIC_ROL_TIARG): Likewise.
(INTRINSIC_ROR): Likewise.
(INTRINSIC_ROR_TIARG): Likewise.
(INTRINSIC_ADDS): Likewise.
(INTRINSIC_ADDSL): Likewise.
(INTRINSIC_ADDU): Likewise.
(INTRINSIC_ADDUL): Likewise.
(INTRINSIC_SUBS): Likewise.
(INTRINSIC_SUBSL): Likewise.
(INTRINSIC_SUBU): Likewise.
(INTRINSIC_SUBUL): Likewise.
(INTRINSIC_MULS): Likewise.
(INTRINSIC_MULSL): Likewise.
(INTRINSIC_MULU): Likewise.
(INTRINSIC_MULUI): Likewise.
(INTRINSIC_MULUL): Likewise.
(INTRINSIC_NEGS): Likewise.
(INTRINSIC_NEGSL): Likewise.
(INTRINSIC_TOPRECF): Likewise.
(INTRINSIC_TOPREC): Likewise.
(INTRINSIC_TOPRECL): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/builtins_reject.d: New test.
* gdc.dg/intrinsics_reject.d: New test.

Daily bump.

Fix minor problem in stack probing

probe_stack_range has an assert to capture the possibility that that
expand_binop might not construct its result in the provided target.

We triggered that internally a little while ago. I'm pretty sure it was in the
testsuite, so no new testcase. The fix is easy, copy the result into the
proper target when needed.

Bootstrapped and regression tested on x86.

gcc/
* explow.cc (probe_stack_range): Handle case when expand_binop
does not construct its result in the expected location.

diagnostics: special-case -fdiagnostics-text-art-charset=ascii for LANG=C

In the LWN discussion of the "ASCII" art in GCC 14
https://lwn.net/Articles/946733/#Comments
there was some concern about the use of non-ASCII characters in the
output.

Currently -fdiagnostics-text-art-charset defaults to "emoji".
To better handle older terminals by default, this patch special-cases
LANG=C to use -fdiagnostics-text-art-charset=ascii.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): When LANG=C, update
default for -fdiagnostics-text-art-charset from emoji to ascii.
* doc/invoke.texi (fdiagnostics-text-art-charset): Document the above.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix missing initialization of context->extra_output_kind

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): Ensure
context->extra_output_kind is initialized.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

i386: Allow -mlarge-data-threshold with -mcmodel=large

From: Fangrui Song <maskray@google.com>

When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song <maskray@google.com>
gcc/ChangeLog:

* config/i386/i386.cc (ix86_can_inline_p):
Handle CM_LARGE and CM_LARGE_PIC.
(x86_elf_aligned_decl_common): Ditto.
(x86_output_aligned_bss): Ditto.
* config/i386/i386.opt: Update doc for -mlarge-data-threshold=.
* doc/invoke.texi: Update doc for -mlarge-data-threshold=.

gcc/testsuite/ChangeLog:

* gcc.target/i386/large-data.c: New test.

RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc

This just moves a few functions out of riscv.cc into riscv-string.cc in an
attempt to keep riscv.cc manageable.  This was originally Christoph's code and
I'm just pushing it on his behalf.

Full disclosure: I built rv64gc after changing to verify everything still
builds.  Given it was just lifting code from one place to another, I didn't run
the testsuite.

gcc/
* config/riscv/riscv-protos.h (emit_block_move): Remove redundant
prototype.  Improve comment.
* config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc
into riscv-string.cc.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved
function.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.

RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv32gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

Implement new RTL optimizations pass: fold-mem-offsets

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.
* gcc.target/i386/pr52146.c: Adjust expected output.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>

d: Merge upstream dmd, druntime 4c18eed967, phobos d945686a4.

D front-end changes:

- Import latest fixes to mainline.

D runtime changes:

- Import latest fixes to mainline.

Phobos changes:

- Import latest fixes to mainline.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4c18eed967.
* d-diagnostic.cc (verrorReport): Update for new front-end interface.
(verrorReportSupplemental): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Likewise.
(d_parse_file): Likewise.
* decl.cc (get_symbol_decl): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4c18eed967.
* src/MERGE: Merge upstream phobos d945686a4.

MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p.

This improves the `A CMP 0 ? A : -A` set of match patterns to use
bitwise_equal_p which allows an nop cast between signed and unsigned.
This allows catching a few extra cases which were not being caught before.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/101541
* match.pd (A CMP 0 ? A : -A): Improve
using bitwise_equal_p.

gcc/testsuite/ChangeLog:

PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-36.c: New test.
* gcc.dg/tree-ssa/phi-opt-37.c: New test.

[PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b

Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a nop
conversion in between the `~` and the `a` which can show up. A similarly thing should
be done for `~a CMP CST`.

I had originally submitted the `~a CMP CST` case as
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html;
I noticed we should do the same thing for the `~a CMP ~b` case and combined
it with that one here.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/31531

gcc/ChangeLog:

* match.pd (~X op ~Y): Allow for an optional nop convert.
(~X op C): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr31531-1.c: New test.
* gcc.dg/tree-ssa/pr31531-2.c: New test.

c++: improve fold-expr location

I want to distinguish between constraint && and fold-expressions there of
written by the user and those implied by template parameter
type-constraints; to that end, let's improve our EXPR_LOCATION for an
explicit fold-expression.

The fold3.C change is needed because this moves the caret from the end of
the expression to the operator, which means the location of the error refers
to the macro invocation rather than the macro definition; both locations are
still printed, but which one is an error and which a note changes.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_fold_expression): Track location range.
* semantics.cc (finish_unary_fold_expr)
(finish_left_unary_fold_expr, finish_right_unary_fold_expr)
(finish_binary_fold_expr): Add location parm.
* constraint.cc (finish_shorthand_constraint): Pass it.
* pt.cc (convert_generic_types_to_packs): Likewise.
* cp-tree.h: Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic3.C: Add expected column.
* g++.dg/cpp1z/fold3.C: Adjust diagnostic lines.

c++: fix truncated diagnostic in C++23 [PR111272]

In C++23, since P2448, a constexpr function F that calls a non-constexpr
function N is OK as long as we don't actually call F in a constexpr
context.  So instead of giving an error in maybe_save_constexpr_fundef,
we only give an error when evaluating the call.  Unfortunately, as shown
in this PR, the diagnostic can be truncated:

z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
   10 |   constexpr Jam() { ft(); }
      |             ^~~

...because what?  With this patch, we say:

z.C:10:13: note: 'constexpr Jam::Jam()' is not usable as a 'constexpr' function because:
   10 |   constexpr Jam() { ft(); }
      |             ^~~
z.C:10:23: error: call to non-'constexpr' function 'int Jam::ft()'
   10 |   constexpr Jam() { ft(); }
      |                     ~~^~
z.C:8:7: note: 'int Jam::ft()' declared here
    8 |   int ft() { return 42; }
      |       ^~

Like maybe_save_constexpr_fundef, explain_invalid_constexpr_fn should
also check the body of a constructor, not just the mem-initializer.

PR c++/111272

gcc/cp/ChangeLog:

* constexpr.cc (explain_invalid_constexpr_fn): Also check the body of
a constructor in C++14 and up.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-diag1.C: New test.

ARC: Split asl dst,1,src into bset dst,0,src to implement 1<<x.

This patch adds a pre-reload splitter to arc.md, to use the bset (set
specific bit instruction) to implement 1<<x (i.e. left shifts of one)
on ARC processors that don't have a barrel shifter.

Currently,

int foo(int x) {
  return 1 << x;
}

when compiled with -O2 -mcpu=em is compiled as a loop:

foo: mov_s   r2,1    ;3
        and.f lp_count,r0, 0x1f
        lpnz    2f
        add r2,r2,r2
        nop
2:      # end single insn loop
        j_s.d   [blink]
        mov_s   r0,r2   ;4

with this patch we instead generate a single instruction:

foo: bset    r0,0,r0
        j_s     [blink]

2023-10-16  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to
use bset dst,0,src to implement 1<<x on !TARGET_BARREL_SHIFTER.

s390: Fix expander popcountv8hi2_vx

The normal form of a CONST_INT which represents an integer of a mode
with fewer bits than in HOST_WIDE_INT is sign extended. This even holds
for unsigned integers.

This fixes an ICE during cse1 where we bail out at rtl.h:2297 since
INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold.

gcc/ChangeLog:

* config/s390/vector.md (popcountv8hi2_vx): Sign extend each
unsigned vector element.

RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

void
foo8 (int64_t *restrict a)
{
for (int i = 0; i < 16; ++i)
a[i] = a[i]-16;
}

We use VLS modes instead of VLA modes even it is specified by dynamic LMUL.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Use VLS modes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: New test.

use more get_range_query

For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
For "get_range_query", it could get more context-aware range info.
And look at the implementation of "get_range_query", it returns
global range if no local fun info.

So, if not quering for SSA_NAME and not chaning the IL, it would
be ok to use get_range_query to replace get_global_range_query.

gcc/ChangeLog:

* fold-const.cc (expr_not_equal_to): Replace get_global_range_query
by get_range_query.
* gimple-fold.cc (size_must_be_zero_p): Likewise.
* gimple-range-fold.cc (fur_source::fur_source): Likewise.
* gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.

Support 32/64-bit vectorization for conversion between _Float16 and integer/float.

gcc/ChangeLog:

* config/i386/mmx.md (V2FI_32): New mode iterator
(movd_v2hf_to_sse): Rename to ..
(movd_<mode>_to_sse): .. this.
(movd_v2hf_to_sse_reg): Rename to ..
(movd_<mode>_to_sse_reg): .. this.
(fix<fixunssuffix>_trunc<mode><mmxintvecmodelower>2): New
expander.
(fix<fixunssuffix>_truncv2hfv2si2): Ditto.
(float<floatunssuffix><mmxintvecmodelower><mode>2): Ditto.
(float<floatunssuffix>v2siv2hf2): Ditto.
(extendv2hfv2sf2): Ditto.
(truncv2sfv2hf2): Ditto.
* config/i386/sse.md (*vec_concatv8hf_movss): Rename to ..
(*vec_concat<mode>_movss): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-hf-convert-1.c: New test.

Enable vectorization for V2HF/V4HF rounding operations and sqrt.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_sse_copysign_to_positive):
Handle HFmode.
(ix86_expand_round_sse4): Ditto.
* config/i386/i386.md (roundhf2): New expander.
(lroundhf<mode>2): Ditto.
(lrinthf<mode>2): Ditto.
(l<rounding_insn>hf<mode>2): Ditto.
* config/i386/mmx.md (sqrt<mode>2): Ditto.
(btrunc<mode>2): Ditto.
(nearbyint<mode>2): Ditto.
(rint<mode>2): Ditto.
(lrint<mode><mmxintvecmodelower>2): Ditto.
(floor<mode>2): Ditto.
(lfloor<mode><mmxintvecmodelower>2): Ditto.
(ceil<mode>2): Ditto.
(lceil<mode><mmxintvecmodelower>2): Ditto.
(round<mode>2): Ditto.
(lround<mode><mmxintvecmodelower>2): Ditto.
* config/i386/sse.md (lrint<mode><sseintvecmodelower>2): Ditto.
(lfloor<mode><sseintvecmodelower>2): Ditto.
(lceil<mode><sseintvecmodelower>2): Ditto.
(lround<mode><sseintvecmodelower>2): Ditto.
(sse4_1_round<ssescalarmodesuffix>): Extend to V8HF.
(round<mode>2): Extend to V8HF/V16HF/V32HF.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-roundhf.c: New test.
* gcc.target/i386/part-vect-sqrtph-1.c: New test.

Daily bump.

libgomp.texi: Update "Enabling OpenMP" + OpenACC / invoke.texi: -fopenacc/-fopenmp update

The OpenACC specification does not mention the '!$ ' sentinel for conditional
compilation and the feature was removed in r11-5572-g1d6f6ac693a860
for PR fortran/98011; update libgomp.texi for this and update a leftover
comment. - Additionally, some other updates are done as well.

libgomp/
* libgomp.texi (Enabling OpenMP): Update for C/C++ attributes;
improve wording especially for Fortran; mention -fopenmp-simd.
(Enabling OpenACC): Minor cleanup; remove conditional compilation
sentinel.

gcc/
* doc/invoke.texi (-fopenacc, -fopenmp, -fopenmp-simd): Use @samp not
@code; document more completely the supported Fortran sentinels.

gcc/fortran
* scanner.cc (skip_free_comments, skip_fixed_comments): Remove
leftover 'OpenACC' from comments about OpenMP's conditional
compilation sentinel.

libgomp.texi: Improve "OpenACC Environment Variables"

None of the ACC_* env vars was documented; in particular, the valid valids
for ACC_DEVICE_TYPE found to be lacking as those are not document in the
OpenACC spec.
GCC_ACC_NOTIFY was removed as I failed to find any traces of it but the
addition to the documentation in commit r6-6185-gcdf6119dad04dd
("libgomp.texi: Updates for OpenACC."). It seems to be planned as GCC
version of the ACC_NOTIFY env var used by another compiler for offloading
debugging.

libgomp/
* libgomp.texi (ACC_DEVICE_TYPE, ACC_DEVICE_NUM, ACC_PROFLIB):
Actually document what the function does.
(GCC_ACC_NOTIFY): Remove unused env var.

libgomp.texi: Use present not future tense

libgomp/ChangeLog:

* libgomp.texi: Replace most future tense by present tense.

sim: add distclean dep for gnulib

ChangeLog:

* Makefile.def: Add distclean-sim dependency on distclean-gnulib.
* Makefile.in: Regenerate.

middle-end: Improved RTL expansion of 1LL << x.

This patch improves the initial RTL expanded for double word shifts
on architectures with conditional moves, so that later passes don't
need to clean-up unnecessary and/or unused instructions.

Consider the general case, x << y, which is expanded well as:

t1 = y & 32;
t2 = 0;
t3 = x_lo >> 1;
t4 = y ^ ~0;
t5 = t3 >> t4;
tmp_hi = x_hi << y;
tmp_hi |= t5;
tmp_lo = x_lo << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;

which is nearly optimal, the only thing that can be improved is
that using a unary NOT operation "t4 = ~y" is better than XOR
with -1, on targets that support it.  [Note the one_cmpl_optab
expander didn't fall back to XOR when this code was originally
written, but has been improved since].

Now consider the relatively common idiom of 1LL << y, which
currently produces the RTL equivalent of:

t1 = y & 32;
t2 = 0;
t3 = 1 >> 1;
t4 = y ^ ~0;
t5 = t3 >> t4;
tmp_hi = 0 << y;
tmp_hi |= t5;
tmp_lo = 1 << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;

Notice here that t3 is always zero, so the assignment of t5
is a variable shift of zero, which expands to a loop on many
smaller targets, a similar shift by zero in the first tmp_hi
assignment (another loop), that the value of t4 is no longer
required (as t3 is zero), and that the ultimate value of tmp_hi
is always zero.

Fortunately, for many (but perhaps not all) targets this mess
gets cleaned up by later optimization passes.  However, this
patch avoids generating unnecessary RTL at expand time, by
calling simplify_expand_binop instead of expand_binop, and
avoiding generating dead or unnecessary code when intermediate
values are known to be zero.  For the 1LL << y test case above,
we now generate:

t1 = y & 32;
t2 = 0;
tmp_hi = 0;
tmp_lo = 1 << y;
out_hi = t1 ? tmp_lo : tmp_hi;
out_lo = t1 ? t2 : tmp_lo;

On arc-elf, for example, there are 18 RTL INSN_P instructions
generated by expand before this patch, but only 12 with this patch
(improving both compile-time and memory usage).

2023-10-15  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* optabs.cc (expand_subword_shift): Call simplify_expand_binop
instead of expand_binop.  Optimize cases (i.e. avoid generating
RTL) when CARRIES or INTO_INPUT is zero.  Use one_cmpl_optab
(i.e. NOT) instead of xor_optab with ~0 to calculate ~OP1.

modula2: Add m2.etags rule to gcc/m2/Make-lang.in

This patch adds the m2.etags rule to gcc/m2/Make-lang.in which
generates etags for the .cc .c .h files within gcc/m2.

gcc/m2/ChangeLog:

* Make-lang.in (m2.tags): New rule.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

wide-int: Fix estimation of buffer sizes for wide_int printing [PR111800]

As mentioned in the PR, my estimations on needed buffer size for wide_int
and especially widest_int printing were incorrect, I've used get_len ()
in the estimations, but that is true only for !wi::neg_p (x) values.
Under the hood, we have 3 ways to print numbers.
print_decs which if
  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
      || (wi.get_len () == 1))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive or
negative) and otherwise uses print_hex,
print_decu which if
  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
      || (wi.get_len () == 1 && !wi::neg_p (wi)))
uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive
only) and print_hex, which doesn't print most significant limbs which are
zero and the first limb which is non-zero prints such that redundant 0
hex digits aren't printed, while all limbs below that are printed with
"%016" PRIx64.  For wi::neg_p (x) values, the first limb of the precision
is always non-zero, so we print all the limbs for the precision.
So, the current estimations are accurate if !wi::neg_p (x), or when
print_decs will be used and x.get_len () == 1, otherwise we need to use
estimation based on get_precision () rather than get_len ().

I've introduced new inlines print_{dec{,s,u},hex}_buf_size which compute the
needed buffer length in bytes and return true if WIDE_INT_PRINT_BUFFER_SIZE
isn't sufficient and caller should XALLOCAVEC the buffer.

2023-10-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/111800
gcc/
* wide-int-print.h (print_dec_buf_size, print_decs_buf_size,
print_decu_buf_size, print_hex_buf_size): New inline functions.
* wide-int.cc (assert_deceq): Use print_dec_buf_size.
(assert_hexeq): Use print_hex_buf_size.
* wide-int-print.cc (print_decs): Use print_decs_buf_size.
(print_decu): Use print_decu_buf_size.
(print_hex): Use print_hex_buf_size.
(pp_wide_int_large): Use print_dec_buf_size.
* value-range.cc (irange_bitmask::dump): Use print_hex_buf_size.
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
print_dec_buf_size.  Use TYPE_SIGN macro in print_dec call argument.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.

d: Merge upstream dmd, druntime f9efc98fd7, phobos a3f22129d.

D front-end changes:

- Import dmd v2.105.2.
- A function with enum storage class is now deprecated.
- Global variables can now be initialized with Associative
  Arrays.
- Improvements for the C++ header generation of static variables
  used in a default argument context.

D runtime changes:

- Import druntime v2.105.2.
- The `core.memory.GC' functions `GC.enable', `GC.disable',
  `GC.collect', and `GC.minimize' `have been marked `@safe'.

Phobos changes:

- Import phobos v2.105.2.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd f9efc98fd7.
* dmd/VERSION: Bump version to v2.105.2.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
* d-diagnostic.cc (verrorReport): Don't emit tips when error gagging
is turned on.
* d-lang.cc (d_handle_option): Remove obsolete parameter.
(d_post_options): Likewise.
(d_read_ddoc_files): New function.
(d_generate_ddoc_file): New function.
(d_parse_file): Update for new front-end interface.
* expr.cc (ExprVisitor::visit (AssocArrayLiteralExp *)): Check for new
front-end lowering of static associative arrays.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime f9efc98fd7.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/newaa.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos a3f22129d.
* testsuite/libphobos.hash/test_hash.d: Update test.
* testsuite/libphobos.phobos/phobos.exp: Add compiler flags
-Wno-deprecated.
* testsuite/libphobos.phobos_shared/phobos_shared.exp: Likewise.

gcc/testsuite/ChangeLog:

* lib/gdc-utils.exp (gdc-convert-args): Handle new compiler options.

combine: Fix handling of unsigned constants

If a CONST_INT represents an integer of a mode with fewer bits than in
HOST_WIDE_INT, then the integer is sign extended. For those two
optimizations touched by this patch, the integers of interest have only
the most significant bit set w.r.t their mode, therefore, they were sign
extended. Thus in order to get the integer of interest, we have to chop
off the high bits.

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Fix handling of unsigned
constants.

RISC-V: Fix vsingle attribute

RVVM2x2QI should be rvvm2qi instead of rvvmq1i.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix vsingle incorrect attribute for RVVM2x2QI.

Daily bump.

libgomp.fortran/allocate-6.f90: Run with -fdump-tree-gimple

libgomp/
* testsuite/libgomp.fortran/allocate-6.f90: Add missing
dg-additional-options "-fdump-tree-gimple"; fix scan.

Fix ICE in set_cell_span, at text-art/table.cc:148 with D front-end and -fanalyzer

The internal error in analyzer turned out to be caused by a subtly
invalid tree representation of STRING_CSTs in the D front-end, fixed by
including the terminating NULL as part of the TREE_STRING_POINTER.

When adding a first analyzer test for D, it flagged up another subtle
mismatch in one assignment in the module support routines as well, fixed
by generating the correct field type for the compiler-generated struct.

PR d/111537

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (StringExp *)): Include null terminator
in STRING_CST string.
* modules.cc (get_compiler_dso_type): Generate ModuleInfo** type for
the minfo fields.

gcc/testsuite/ChangeLog:

* gdc.dg/analyzer/analyzer.exp: New test.
* gdc.dg/analyzer/pr111537.d: New test.

d: Reduce code duplication of writing generated files.

Small refactoring ahead of the next merge from upstream, where a few
more front-end routines will stop doing the file handling themselves.

gcc/d/ChangeLog:

* d-lang.cc (d_write_file): New function.
(d_parse_file): Reduce code duplication.

libgomp.texi: Note to 'Memory allocation' sect and missing mem-memory routines

This commit completes the documentation of the OpenMP memory-management
routines, except for the unimplemented TR11 additions. It also makes clear
in the 'Memory allocation' section of the 'OpenMP-Implementation Specifics'
chapter under which condition OpenMP managed memory/allocators are used.

libgomp/ChangeLog:

* libgomp.texi: Fix some typos.
(Memory Management Routines): Document remaining 5.x routines.
(Memory allocation): Make clear when the section applies.

Fortran: Support OpenMP's 'allocate' directive for stack vars

gcc/fortran/ChangeLog:

* gfortran.h (ext_attr_t): Add omp_allocate flag.
* match.cc (gfc_free_omp_namelist): Void deleting same
u2.allocator multiple times now that a sequence can use
the same one.
* openmp.cc (gfc_match_omp_clauses, gfc_match_omp_allocate): Use
same allocator expr multiple times.
(is_predefined_allocator): Make static.
(gfc_resolve_omp_allocate): Update/extend restriction checks;
remove sorry message.
(resolve_omp_clauses): Reject corarrays in allocate/allocators
directive.
* parse.cc (check_omp_allocate_stmt): Permit procedure pointers
here (rejected later) for less misleading diagnostic.
* trans-array.cc (gfc_trans_auto_array_allocation): Propagate
size for GOMP_alloc and location to which it should be added to.
* trans-decl.cc (gfc_trans_deferred_vars): Handle 'omp allocate'
for stack variables; sorry for static variables/common blocks.
* trans-openmp.cc (gfc_trans_omp_clauses): Evaluate 'allocate'
clause's allocator only once; fix adding expressions to the
block.
(gfc_trans_omp_single): Pass a block to gfc_trans_omp_clauses.

gcc/ChangeLog:

* gimplify.cc (gimplify_bind_expr): Handle Fortran's
'omp allocate' for stack variables.

libgomp/ChangeLog:

* libgomp.texi (OpenMP Impl. Status): Mention that Fortran now
supports the allocate directive for stack variables.
* testsuite/libgomp.fortran/allocate-5.f90: New test.
* testsuite/libgomp.fortran/allocate-6.f90: New test.
* testsuite/libgomp.fortran/allocate-7.f90: New test.
* testsuite/libgomp.fortran/allocate-8.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/allocate-14.c: Fix directive name.
* c-c++-common/gomp/allocate-15.c: Likewise.
* c-c++-common/gomp/allocate-9.c: Fix comment typo.
* gfortran.dg/gomp/allocate-4.f90: Remove sorry dg-error.
* gfortran.dg/gomp/allocate-7.f90: Likewise.
* gfortran.dg/gomp/allocate-10.f90: New test.
* gfortran.dg/gomp/allocate-11.f90: New test.
* gfortran.dg/gomp/allocate-12.f90: New test.
* gfortran.dg/gomp/allocate-13.f90: New test.
* gfortran.dg/gomp/allocate-14.f90: New test.
* gfortran.dg/gomp/allocate-15.f90: New test.
* gfortran.dg/gomp/allocate-8.f90: New test.
* gfortran.dg/gomp/allocate-9.f90: New test.

middle-end: Allow _BitInt(65535) [PR102989]

The following patch lifts further restrictions which limited _BitInt to at
most 16319 bits up to 65535.
The problem was mainly in INTEGER_CST representation, which had 3
unsigned char members to describe lengths in number of 64-bit limbs, which
it wanted to fit into 32 bits.  This patch removes the third one which was
just a cache to save a few compile time cycles for wi::to_offset and
enlarges the other two members to unsigned short.
Furthermore, the same problem has been in some uses of trailing_wide_int*
(in value-range-storage*) and value-range-storage* itself, while other
uses of trailing_wide_int* have been fine (e.g. CONST_POLY_INT, where no
constants will be larger than 3/5/9/11 limbs depending on target, so 255
limit is plenty).  The patch turns all those length representations to be
unsigned short for consistency, so value-range-storage* can handle even
16320-65535 bits BITINT_TYPE ranges.  The cc1plus growth is about 16K,
so not really significant for 38M .text section.

Note, the reason for the new limit is
  unsigned int precision : 16;
TYPE_PRECISION limit, if we wanted to overcome that, TYPE_PRECISION would
need to use some other member for BITINT_TYPE from all the others and
we could reach that way 4194239 limit (65535 * 64 - 1, again implied by
INTEGER_CST and value-range-storage*).  Dunno if that is
worth it or if it is something we want to do for GCC 14 though.

2023-10-14  Jakub Jelinek  <jakub@redhat.com>

PR c/102989
gcc/
* tree-core.h (struct tree_base): Remove int_length.offset
member, change type of int_length.unextended and int_length.extended
from unsigned char to unsigned short.
* tree.h (TREE_INT_CST_OFFSET_NUNITS): Remove.
(wi::extended_tree <N>::get_len): Don't use TREE_INT_CST_OFFSET_NUNITS,
instead compute it at runtime from TREE_INT_CST_EXT_NUNITS and
TREE_INT_CST_NUNITS.
* tree.cc (wide_int_to_tree_1): Don't assert
TREE_INT_CST_OFFSET_NUNITS value.
(make_int_cst): Don't initialize TREE_INT_CST_OFFSET_NUNITS.
* wide-int.h (WIDE_INT_MAX_ELTS): Change from 255 to 1024.
(WIDEST_INT_MAX_ELTS): Change from 510 to 2048, adjust comment.
(trailing_wide_int_storage): Change m_len type from unsigned char *
to unsigned short *.
(trailing_wide_int_storage::trailing_wide_int_storage): Change second
argument from unsigned char * to unsigned short *.
(trailing_wide_ints): Change m_max_len type from unsigned char to
unsigned short.  Change m_len element type from
struct{unsigned char len;} to unsigned short.
(trailing_wide_ints <N>::operator []): Remove .len from m_len
accesses.
* value-range-storage.h (irange_storage::lengths_address): Change
return type from const unsigned char * to const unsigned short *.
(irange_storage::write_lengths_address): Change return type from
unsigned char * to unsigned short *.
* value-range-storage.cc (irange_storage::write_lengths_address):
Likewise.
(irange_storage::lengths_address): Change return type from
const unsigned char * to const unsigned short *.
(write_wide_int): Change len argument type from unsigned char *&
to unsigned short *&.
(irange_storage::set_irange): Change len variable type from
unsigned char * to unsigned short *.
(read_wide_int): Change len argument type from unsigned char to
unsigned short.  Use trailing_wide_int_storage <unsigned short>
instead of trailing_wide_int_storage and
trailing_wide_int <unsigned short> instead of trailing_wide_int.
(irange_storage::get_irange): Change len variable type from
unsigned char * to unsigned short *.
(irange_storage::size): Multiply n by sizeof (unsigned short)
in len_size variable initialization.
(irange_storage::dump): Change len variable type from
unsigned char * to unsigned short *.
gcc/cp/
* module.cc (trees_out::start, trees_in::start): Remove
TREE_INT_CST_OFFSET_NUNITS handling.
gcc/testsuite/
* gcc.dg/bitint-38.c: Change into dg-do run test, in addition
to checking the addition, division and right shift results at compile
time check it also at runtime.
* gcc.dg/bitint-39.c: New test.

RISC-V: Remove redundant iterators.

These iterators are redundant, removed and commmitted.
gcc/ChangeLog:

* config/riscv/vector-iterators.md: Remove redundant iterators.

Daily bump.

Fortran: name conflict between internal procedure and derived type [PR104351]

gcc/fortran/ChangeLog:

PR fortran/104351
* decl.cc (get_proc_name): Extend name conflict detection between
internal procedure and previous declaration also to derived type.

gcc/testsuite/ChangeLog:

PR fortran/104351
* gfortran.dg/derived_function_interface_1.f90: Adjust pattern.
* gfortran.dg/pr104351.f90: New test.

fortran: fix handling of options -ffpe-trap and -ffpe-summary [PR110957]

gcc/fortran/ChangeLog:

PR fortran/110957
* invoke.texi: Update documentation to reflect '-ffpe-trap=none'.
* options.cc (gfc_handle_fpe_option): Fix mixup up of error messages
for options -ffpe-trap and -ffpe-summary. Accept '-ffpe-trap=none'
to clear FPU traps previously set on command line.

Do not add partial equivalences with no uses.

PR tree-optimization/111622
* value-relation.cc (equiv_oracle::add_partial_equiv): Do not
register a partial equivalence if an operand has no uses.

OMP SIMD inbranch call vectorization for AVX512 style masks

The following teaches vectorizable_simd_clone_call to handle
integer mode masks. The tricky bit is to second-guess the
number of lanes represented by a single mask argument - the following
uses simdlen and the number of mask arguments to calculate that,
assuming ABIs have them uniform.

Similar to the VOIDmode handling there's a restriction on not
supporting splitting/merging of incoming vector masks to
more/less SIMD call arguments.

PR tree-optimization/111795
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
integer mode mask arguments.

* gcc.target/i386/vect-simd-clone-avx512-1.c: New testcase.
* gcc.target/i386/vect-simd-clone-avx512-2.c: Likewise.
* gcc.target/i386/vect-simd-clone-avx512-3.c: Likewise.

Add support for SLP vectorization of OpenMP SIMD clone calls

This adds support for SLP vectorization of OpenMP SIMD clone calls.
There's a complication when vectorizing calls involving virtual
operands since this is now for the first time not only leafs (loads
or stores).  With SLP this runs into the issue that placement of
the vectorized stmts is not necessarily at one of the original
scalar stmts which leads to the magic updating virtual operands
in vect_finish_stmt_generation not working.  So we run into the
assert that updating virtual operands isn't necessary.  I've
papered over this similar to how we do for mismatched const/pure
attribution by setting vinfo->any_known_not_updated_vssa.

I've added two basic testcases with multi-lane SLP and verified
that with single-lane SLP enabled the rest of the existing testcases
pass.

* tree-vect-slp.cc (mask_call_maps): New.
(vect_get_operand_map): Handle IFN_MASK_CALL.
(vect_build_slp_tree_1): Likewise.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
SLP.

* gcc.dg/vect/slp-simd-clone-1.c: New testcase.
* gcc.dg/vect/slp-simd-clone-2.c: Likewise.

RISC-V Regression: Fix FAIL of bb-slp-68.c for RVV

Like comment said, this test failed on 64 bytes vector.
Both RVV and GCN has 64 bytes vector.

So it's more reasonable to use vect512.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-68.c: Use vect512.

RISC-V: Refine run test cases of math autovec

For the run test cases of math autovec, we need a reference value to
check if the return value is expected or not.

The previous patch leverage hardcode for the reference value but we
can leverage the scalar math function instead. For example ceil after
autovec.

ASSERT (CEIL (Vector {1.2,...}) == Vector {2.0, ...});

But we can leverage the scalar math function to avoid potential mistakes.

ASSERT (CEIL (Vector {1.2,...}) == Vector {ceil (1.2), ...});

This patch remove some fflags check as it covered by check-body already.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c:
Use scalar func as reference instead of hardcode.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for FP llfloor auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

long long llfloor (double);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llfloor-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for FP ifloor auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

int ifloor (float);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ifloor-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for FP iceil auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

int iceil (float);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-iceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-iceil-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for FP llceil auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

long long llceil (double);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

C99 testsuite readiness: Some verified test case adjustments

The updated test cases still reproduce the bugs with old compilers.

gcc/testsuite/

* gcc.c-torture/compile/pc44485.c (func_21): Add missing cast.
* gcc.c-torture/compile/pr106101.c: Use builtins to avoid
calls to undeclared functions. Change type of yyvsp to
char ** and introduce yyvsp1 to avoid type errors.
* gcc.c-torture/execute/pr111331-1.c: Add missing int.
* gcc.dg/pr100512.c: Unreduce test case and suppress only
-Wpointer-to-int-cast.
* gcc.dg/pr103003.c: Likewise.
* gcc.dg/pr103451.c: Add cast to long and suppress
-Wdiv-by-zero only.
* gcc.dg/pr68435.c: Avoid implicit int and missing
static function implementation warning.

C99 test suite readiness: Some unverified test case adjustments

These changes are assumed not to interfere with the test objective,
but it was not possible to reproduce the historic test case failures
(with or without the modification here).

gcc/testsuite/

* gcc.c-torture/compile/20000105-1.c: Add missing int return type.
Call __builtin_exit instead of exit.
* gcc.c-torture/compile/20000105-2.c: Add missing void types.
* gcc.c-torture/compile/20000211-1.c (Lstream_fputc, Lstream_write)
(Lstream_flush_out, parse_doprnt_spec): Add missing function
declaration.
* gcc.c-torture/compile/20000224-1.c (call_critical_lisp_code):
Declare.
* gcc.c-torture/compile/20000314-2.c: Add missing void types.
* gcc.c-torture/compile/980816-1.c (XtVaCreateManagedWidget)
(XtAddCallback): Likewise.
* gcc.c-torture/compile/pr49474.c: Use struct
gfc_formal_arglist * instead of (implied) int type.
* gcc.c-torture/execute/20001111-1.c (foo): Add cast to
char *.
(main): Call __builtin_abort and __builtin_exit.

C99 test suite readiness: Mark some C89 tests

Add -std=gnu89 to some tests which evidently target C89-only language
features.

gcc/testsuite/

* gcc.c-torture/compile/920501-11.c: Compile with -std=gnu89.
* gcc.c-torture/compile/920501-23.c: Likewise.
* gcc.c-torture/compile/920501-8.c: Likewise.
* gcc.c-torture/compile/920701-1.c: Likewise.
* gcc.c-torture/compile/930529-1.c: Likewise.

or1k: Fix -Wincompatible-pointer-types warning during libgcc build

libgcc/

* config/or1k/linux-unwind.h (or1k_fallback_frame_state): Add
missing cast.

arc: Fix -Wincompatible-pointer-types warning during libgcc build

libgcc/

* config/arc/linux-unwind.h (arc_fallback_frame_state): Add
missing cast.

riscv: Fix -Wincompatible-pointer-types warning during libgcc build

libgcc/

* config/riscv/linux-unwind.h (riscv_fallback_frame_state): Add
missing cast.

csky: Fix -Wincompatible-pointer-types warning during libgcc build

libgcc/

* config/csky/linux-unwind.h (csky_fallback_frame_state): Add
missing cast.

m68k: Avoid implicit function declaration in libgcc

libgcc/

* config/m68k/fpgnulib.c (__cmpdf2): Declare.

libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc

The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too. By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).

2023-10-13 Jakub Jelinek <jakub@redhat.com>

* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.

tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

The following handles byte-aligned, power-of-two and byte-multiple
sized BIT_FIELD_REF reads in SRA. In particular this should cover
BIT_FIELD_REFs created by optimize_bit_field_compare.

For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
appearing there leading to more DSE, fully eliding the aggregates.

This results in the same false positive -Wuninitialized as the
older attempt to remove the folding from optimize_bit_field_compare,
fixed by initializing part of the aggregate unconditionally.

PR tree-optimization/111779
gcc/
* tree-sra.cc (sra_handled_bf_read_p): New function.
(build_access_from_expr_1): Handle some BIT_FIELD_REFs.
(sra_modify_expr): Likewise.
(make_fancy_name_1): Skip over BIT_FIELD_REF.

gcc/fortran/
* trans-expr.cc (gfc_trans_assignment_1): Initialize
lhs_caf_attr and rhs_caf_attr codimension flag to avoid
false positive -Wuninitialized.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
* gcc.dg/vect/vect-pr111779.c: New testcase.

tree-optimization/111773 - avoid CD-DCE of noreturn special calls

The support to elide calls to allocation functions in DCE runs into
the issue that when implementations are discovered noreturn we end
up DCEing the calls anyway, leaving blocks without termination and
without outgoing edges which is both invalid IL and wrong-code when
as in the example the noreturn call would throw. The following
avoids taking advantage of both noreturn and the ability to elide
allocation at the same time.

For the testcase it's valid to throw or return 10 by eliding the
allocation. But we have to do either where currently we'd run
off the function.

PR tree-optimization/111773
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do
not elide noreturn calls that are reflected to the IL.

* g++.dg/torture/pr111773.C: New testcase.

RISC-V: Add test for FP llround auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

long long llround (double);

This patch would like to add the test cases for ensuring the correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llround-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

Like ARM SVE and GCN, add RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Add RVV.

RISC-V: Add test for FP iroundf auto vectorization

The below FP API are supported already by sharing the same standard
name, as well as the machine mode.

int iroundf (float);

This patch would like to add the test cases for ensuring the
correctness.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-iround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-iround-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

riscv_legitimize_poly_move was expected to ensure the poly value is at most 32
times smaller than the minimal VLEN (32 being derived from '4096 / 128').
This assumption held when our mode modeling was not so precisely defined.
However, now that we have modeled the mode size according to the correct minimal
VLEN info, the size difference between different RVV modes can be up to 64
times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1]
versus [64, 64] respectively.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump
max_power to 64.
* config/riscv/riscv.h (MAX_POLY_VARIANT): New.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-01.C: New.
* g++.target/riscv/rvv/rvv.exp: Add autovec folder.

RISC-V: Leverage stdint-gcc.h for RVV test cases

Leverage stdint-gcc.h for the int64_t types instead of typedef.
Or we may have conflict with stdint-gcc.h in somewhere else.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include
stdint-gcc.h for int types.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t
typedef.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Support FP lfloor/lfloorf auto vectorization

This patch would like to support the FP lfloor/lfloorf auto vectorization.

* long lfloor (double) for rv64
* long lfloorf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lfloormn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lfloor (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lfloor (in[i]);
}

Before this patch:
.L3:
  ...
  fld         fa5,0(a1)
  fcvt.l.d    a5,fa5,rdn
  sd          a5,-8(a0)
  ...
  bne         a1,a4,.L3

After this patch:
  frrm        a6
  ...
  fsrmi       2 // RDN
.L3:
  ...
  vsetvli     a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli     zero,a2,e64,m1,ta,ma
  vse32.v     v1,0(a0)
  ...
  bne         a2,zero,.L3
  ...
  fsrm        a6

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lfloor<mode><v_i_l_ll_convert>2): New
pattern for lfloor/lfloorf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lfloor): New func decl for expanding lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl
for expanding lfloor.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

testsuite: Replace many dg-require-thread-fence with dg-require-atomic-cmpxchg-word

These tests actually use a form of atomic compare and exchange
operation, not just atomic loading and storing. Some targets (not
supported by e.g. libatomic) have atomic loading and storing, but not
compare and exchange, yielding linker errors for missing library
functions.

This change is just for existing uses of
dg-require-thread-fence. It does not fix any other tests
that should also be gated on dg-require-atomic-cmpxchg-word.

* testsuite/29_atomics/atomic/compare_exchange_padding.cc,
testsuite/29_atomics/atomic_flag/clear/1.cc,
testsuite/29_atomics/atomic_flag/cons/value_init.cc,
testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc,
testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc,
testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc,
testsuite/29_atomics/atomic_ref/generic.cc,
testsuite/29_atomics/atomic_ref/integral.cc,
testsuite/29_atomics/atomic_ref/pointer.cc: Replace
dg-require-thread-fence with dg-require-atomic-cmpxchg-word.

testsuite: Add dg-require-atomic-cmpxchg-word

Some targets (armv6-m) support inline atomic load and store,
i.e. dg-require-thread-fence matches, but not atomic operations like
compare and exchange.

This directive can be used to replace uses of dg-require-thread-fence
where an atomic operation is actually used.

* testsuite/lib/dg-options.exp (dg-require-atomic-cmpxchg-word):
New proc.
* testsuite/lib/libstdc++.exp (check_v3_target_atomic_cmpxchg_word):
Ditto.

Daily bump.

RISC-V: Support FP lceil/lceilf auto vectorization

This patch would like to support the FP lceil/lceilf auto vectorization.

* long lceil (double) for rv64
* long lceilf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lceilmn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lceil (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lceil (in[i]);
}

Before this patch:
.L3:
  ...
  fld         fa5,0(a1)
  fcvt.l.d    a5,fa5,rup
  sd          a5,-8(a0)
  ...
  bne         a1,a4,.L3

After this patch:
  frrm        a6
  ...
  fsrmi       3 // RUP
.L3:
  ...
  vsetvli     a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli     zero,a2,e64,m1,ta,ma
  vse32.v     v1,0(a0)
  ...
  bne         a2,zero,.L3
  ...
  fsrm        a6

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lceil<mode><v_i_l_ll_convert>2): New
pattern] for lceil/lceilf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lceil): New func decl for expanding lceil.
* config/riscv/riscv-v.cc (expand_vec_lceil): New func impl
for expanding lceil.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

PR111778, PowerPC: Do not depend on an undefined shift

I was building a cross compiler to PowerPC on my x86_86 workstation with the
latest version of GCC on October 11th.  I could not build the compiler on the
x86_64 system as it died in building libgcc.  I looked into it, and I
discovered the compiler was recursing until it ran out of stack space.  If I
build a native compiler with the same sources on a PowerPC system, it builds
fine.

I traced this down to a change made around October 10th:

| commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
| Author: Jiufu Guo <guojiufu@linux.ibm.com>
| Date:   Tue Jan 10 20:52:33 2023 +0800
|
|   rs6000: build constant via li/lis;rldicl/rldicr
|
|   If a constant is possible left/right cleaned on a rotated value from
|   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
|   to build the constant.

The code was doing a -1 << 64 which is undefined behavior because different
machines produce different results.  On the x86_64 system, (-1 << 64) produces
-1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64 then
recurses until the stack runs out of space.

If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
crosss compiler and on a native PowerPC system.

2023-10-12  Michael Meissner  <meissner@linux.ibm.com>

gcc/

PR target/111778
* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect
code from shifts that are undefined.
(can_be_built_by_li_lis_and_rldicr): Likewise.
(can_be_built_by_li_and_rldic): Protect code from shifts that
undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.

libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory

In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was
insufficiently specified; 5.2 clarified this with extensions/clarifications
(omp_initial_device, omp_invalid_device, "conforming device number").
GCC's implementation matches OpenMP 5.2.

libgomp/ChangeLog:

* libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to
OMP_TARGET_OFFLOAD.
(OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE;
clarify MANDATORY behavior.

reg-notes.def: Fix up description of REG_NOALIAS

The description of the REG_NOALIAS note in reg-notes.def isn't quite
right. It describes it as being attached to call insns, but it is
instead attached to a move insn receiving the return value from a call.

This can be seen by looking at the code in calls.cc:expand_call which
attaches the note:

  emit_move_insn (temp, valreg);

  /* The return value from a malloc-like function cannot alias
     anything else.  */
  last = get_last_insn ();
  add_reg_note (last, REG_NOALIAS, temp);

gcc/ChangeLog:

* reg-notes.def (NOALIAS): Correct comment.

RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering

Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA")
A recent change broke the xtheadcondmov-indirect tests, because the order of
emitted instructions changed. Since the test is too strict when testing for
a fixed instruction order, let's change the tests to simply count instruction,
like it is done for similar tests.

Reported-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
instruction reordering.

wide-int: Fix build with gcc < 12 or clang++ [PR111787]

While my wide_int patch bootstrapped/regtested fine when I used GCC 12
as system gcc, apparently it doesn't with GCC 11 and older or clang++.
For GCC before PR96555 C++ DR1315 implementation the compiler complains
about template argument involving template parameters, for clang++ the
same + complains about missing needs_write_val_arg static data member
in some wi::int_traits specializations.

2023-10-12 Jakub Jelinek <jakub@redhat.com>

PR bootstrap/111787
* tree.h (wi::int_traits <unextended_tree>::needs_write_val_arg): New
static data member.
(int_traits <extended_tree <N>>::needs_write_val_arg): Likewise.
(wi::ints_for): Provide separate partial specializations for
generic_wide_int <extended_tree <N>> and INL_CONST_PRECISION or that
and CONST_PRECISION, rather than using
int_traits <extended_tree <N> >::precision_type as the second template
argument.
* rtl.h (wi::int_traits <rtx_mode_t>::needs_write_val_arg): New
static data member.
* double-int.h (wi::int_traits <double_int>::needs_write_val_arg):
Likewise.

RISCV: Bugfix for incorrect documentation heading nesting

PR middle-end/111777

gcc/ChangeLog:
* doc/extend.texi: Change subsubsection to subsection for
CORE-V built-ins.

AArch64: Fix Armv9-a warnings that get emitted whenever a ACLE header is used.

At the moment, trying to use -march=armv9-a with any ACLE header such as
arm_neon.h results in rows and rows of warnings saying:

<built-in>: warning: "__ARM_ARCH" redefined
<built-in>: note: this is the location of the previous definition

This is obviously not useful and happens because the header was defined at
__ARM_ARCH == 8 and the commandline changes it.

The Arm port solves this by undef the macro during argument processing and we do
the same on AArch64 for the majority of macros. However we define this macro
using a different helper which requires the manual undef.

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add undef.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/armv9_warning.c: New test.

wide-int: Add simple CHECKING_P stack-protector canary like checking

This patch adds hopefully not so expensive --enable-checking=yes
verification that the widest_int upper length bound estimates are really
upper bounds and nothing attempts to write more elements.
It is done only if the estimated upper length bound is smaller than
WIDE_INT_MAX_INL_ELTS, but that should be the most common case unless
large _BitInt is involved.

2023-10-12 Jakub Jelinek <jakub@redhat.com>

* wide-int.h (widest_int_storage <N>::write_val): If l is small
and there is space in u.val array, store a canary value at the
end when checking.
(widest_int_storage <N>::set_len): Check the canary hasn't been
overwritten.