git.ipfire.org Git - thirdparty/gcc.git/log

ada: Fix discrepancy in expansion of untagged record equality

The expansion of the predefined equality operator for untagged record types
can be done either in line, i.e. into the component-wise comparison of the
operands, or out of line, i.e. into a call to a function implementing this
comparison, and the heuristics of the selection are essentially based on the
complexity of the implementation.

For discriminated record types with a variant part, which comprise unchecked
union types, the expansion is always done out of line. For nondiscriminated
types, the expansion is done in line, unless one of the components is of a
record type for which a user-defined equality operator exists, in which case
the expansion is done out of line.

For the third case, i.e. discriminated record types without a variant part,
the expansion is always done in line. Now given that the discriminants are
considered as mere components for the purpose of predefined equality in this
case, there does not seem to be any reason for treating it differently from
the second case above.

gcc/ada/

* exp_ch3.adb (Build_Untagged_Equality): Rename into...
(Build_Untagged_Record_Equality): ...this.
(Expand_Freeze_Record_Type): Adjust to above renaming and invoke
the procedure also for discriminated types without a variant part.

ada: Fix small inaccuracy in implementation of B.3.3(20/2)

This is the clause about inferable discriminants in unchecked unions.

gcc/ada/

* sem_util.adb (Has_Inferable_Discriminants): In the case of a
component with a per-object constraint, also return true if the
enclosing object is not of an unchecked union type.
In the default case, remove a useless call to Base_Type.

PR modula2/110125 variables reported as uninitialized when set inside WITH

The modula-2 static analysis incorrectly identifies variables as
uninitialized if they are initialized within a WITH statement.  This bug
fix re-implements the variable static analysis and will detect simple
pointer record fields being accessed before being initialized.
The static analysis is limited to the first basic block in a procedure.
It does not check variant records, arrays or sets.  A new option
-Wuninit-variable-checking will turn on the new semantic checking
(-Wall also enables the new checking).

gcc/ChangeLog:

PR modula2/110125
* doc/gm2.texi (Semantic checking): Include examples using
-Wuninit-variable-checking.

gcc/m2/ChangeLog:

PR modula2/110125
* Make-lang.in (GM2-COMP-BOOT-DEFS): Add M2SymInit.def.
(GM2-COMP-BOOT-MODS): Add M2SymInit.mod.
* gm2-compiler/M2BasicBlock.mod: Formatting changes.
* gm2-compiler/M2Code.mod: Remove import of VariableAnalysis from
M2Quads.  Import VariableAnalysis from M2SymInit.mod.
* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
Add debugging print for a component.
(TypeConstFullyDeclared): Call RememberType for every type.
* gm2-compiler/M2GenGCC.mod (CodeReturnValue): Add parameter to
GetQuadOtok.
(CodeBecomes): Add parameter to GetQuadOtok.
(CodeXIndr): Add parameter to GetQuadOtok.
* gm2-compiler/M2Optimize.mod (ReduceBranch): Reformat and
preserve operand token positions when reducing the branch
quadruples.
(ReduceGoto): Reformat.
(FoldMultipleGoto): Reformat.
(KnownReachable): Reformat.
* gm2-compiler/M2Options.def (UninitVariableChecking): New
variable declared and exported.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Options.mod (SetWall): Set
UninitVariableChecking.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Quads.def (PutQuadOtok): Exported and declared.
(VariableAnalysis): Removed.
* gm2-compiler/M2Quads.mod (PutQuadOtok): New procedure.
(doVal): Reformatted.
(MarkAsWrite): Reformatted.
(MarkArrayAsWritten): Reformatted.
(doIndrX): Use PutQuadOtok.
(MakeRightValue): Use GenQuadOtok.
(MakeLeftValue): Use GenQuadOtok.
(CheckReadBeforeInitialized): Remove.
(IsNeverAltered): Reformat.
(DebugLocation): New procedure.
(BuildDesignatorPointer): Use GenQuadO to preserve operand token
position.
(BuildRelOp): Use GenQuadOtok ditto.
* gm2-compiler/SymbolTable.def (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
* gm2-compiler/SymbolTable.mod (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
(LRInitDesc): New type.
(SymVar): InitState new field.
(MakeVar): Initialize InitState.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New function declaration.
* gm2-lang.cc (gm2_langhook_handle_option): Detect
OPT_Wuninit_variable_checking and call SetUninitVariableChecking.
* lang.opt: Add Wuninit-variable-checking.
* gm2-compiler/M2SymInit.def: New file.
* gm2-compiler/M2SymInit.mod: New file.

gcc/testsuite/ChangeLog:

PR modula2/110125
* gm2/switches/uninit-variable-checking/fail/testinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallvec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithnoptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr3.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

RISC-V: Support vfwmul.vv combine lowering

Consider the following complicate case:
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).

gcc/ChangeLog:

* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@"
into "*" in pattern name which simplifies build files.
(*pred_single_widen_mul<any_extend:su><mode>): Ditto.
(*pred_single_widen_mul<mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.

aarch64: Fix vector-to-vector vec_extract

The documentation says:

-------------------------------------------------------------------------
@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
@item @samp{vec_extract@var{m}@var{n}}
Extract given field from the vector value.  [...]  The
@var{n} mode is the mode of the field or vector of fields that should be
extracted, [...]
If @var{n} is a vector mode, the index is counted in units of that mode.
-------------------------------------------------------------------------

However, Robin pointed out that, in practice, the index is counted
in whole multiples of @var{n}.  These are the semantics that x86
and target-independent code follow.

This patch updates the aarch64 pattern to match, which also removes
the FAIL.  I think Robin has patches that update the documentation
and make more use of the de facto semantics.

I haven't found an existing testcase that shows the difference.
We do now use the pattern for:

union u { int32x4_t x; int32x2_t y[2]; };
int32x2_t f(int32x4_t x) { union u u = { x }; return u.y[1]; }

but we were already generating perfect code for it.  Because of that,
it didn't really seem worth adding a specific dump test.

gcc/
* config/aarch64/aarch64-simd.md (vec_extract<mode><Vhalf>): Expect
the index to be 0 or 1.

Revert "RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering"

This reverts commit 47e6dcb597b2d4abcab13c9dea0cc7d2131b6419.

RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

RISC-V: Fix one typo of FRM dynamic definition

This patch would like to fix one typo that take rdn instead of dyn by
mistake.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector.md: Fix typo.

tree-optimization/110506 - ICE in pattern recog with TYPE_PRECISION

The following re-orders checks to make sure we check TYPE_PRECISION
on an integral type.

PR tree-optimization/110506
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Re-order
TYPE_PRECISION access with INTEGRAL_TYPE_P check.

* gcc.dg/pr110506-2.c: New testcase.

tree-optimization/110506 - bogus non-zero mask in CCP for vector types

get_value_for_expr was blindlessly using TYPE_PRECISION to produce
a mask for vector typed entities which the new tree checking now
catches.

PR tree-optimization/110506
* tree-ssa-ccp.cc (get_value_for_expr): Check for integral
type before relying on TYPE_PRECISION to produce a nonzero mask.

* gcc.dg/pr110506.c: New testcase.

testsuite: Add vect_float_strict to testcase [PR 110381]

As discussed in the PR, the testcase needs
/* { dg-require-effective-target vect_float_strict } */

2023-02-03 Andrew Pinski <apinski@marvell.com>

PR tree-optimization/110381
gcc/testsuite/

* gcc.dg/vect/pr110381.c: Add vect_float_strict.

MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain conditions

This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.

gcc/ChangeLog:
* config/mips/mips.md(*and<mode>3_mips16): Generates
ZEB/ZEH instructions.

MIPS: Add CACHE instruction for mips16e2

This patch adds CACHE instruction from mips16e2
with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.

MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2

The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.

MIPS: Add load/store word left/right instructions for mips16e2

This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(mov_<load>l): Generates instructions.
(mov_<load>r): Same as above.
(mov_<store>l): Adjusted for the conditions above.
(mov_<store>r): Same as above.
(mov_<store>l_mips16e2): Add machine description for `define_insn mov_<store>l_mips16e2`.
(mov_<store>r_mips16e2): Add machine description for `define_insn mov_<store>r_mips16e2`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.

MIPS: Add LUI instruction for mips16e2

This patch adds LUI instruction from mips16e2
with corresponding test.

gcc/ChangeLog:

* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* config/mips/mips.md: Same as above

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.

MIPS: Add bitwise instructions for mips16e2

There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .

This patch adds these instrutions with corresponding tests.

gcc/ChangeLog:

* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_bit_clear_p): Declared new function.
(mips_bit_clear_info): Same as above.
* config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(extended_mips16): Generates EXT and INS instructions.
(*and<mode>3): Generates INS instruction.
(*and<mode>3_mips16): Generates EXT, INS and ANDI instructions.
(ior<mode>3): Add logics for ORI instruction.
(*ior<mode>3_mips16_asmacro): Generates ORI instrucion.
(*ior<mode>3_mips16): Add logics for XORI instruction.
(*xor<mode>3_mips16): Generates XORI instrucion.
(*extzv<mode>): Add logics for EXT instruction.
(*insv<mode>): Add logics for INS instruction.
* config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.

MIPS: Add instruction about global pointer register for mips16e2

The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.

As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .

This patch adds these instructions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low<mode>_mips16`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.

MIPS: Add MOVx instructions support for mips16e2

This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov<GPR:mode>_on_<MOVECC:mode>): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<MOVECC:mode>_mips16e2): Generate MOVx instruction.
(*mov<GPR:mode>_on_<GPR2:mode>_ne): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<GPR2:mode>_ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate for MOVx insts.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.

MIPS: Add basic support for mips16e2

The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).

This patch adds basic support for mips16e2 used by the
following series of patches.

gcc/ChangeLog:

* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* config/mips/predicates.md: Add clause for TARGET_MIPS16E2.
* doc/invoke.texi: Add -m(no-)mips16e2 option..

gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.

Daily bump.

d: Fix testcase failure of gdc.dg/Wbuiltin_declaration_mismatch2.d.

Seen at least on aarch64-*-darwin, the parameters used to instantiate
the shufflevector intrinsic meant the return type was __vector(int[1]),
which resulted in the error:

vector type '__vector(int[1])' is not supported on this platform.

All instantiations have now been fixed so the expected warning/error is
now given by the compiler.

gcc/testsuite/ChangeLog:

* gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests.

tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]

The match_uaddc_usubc matching doesn't require that the second
.{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is
at most one. So, in the weird case where the REALPART_EXPR of it isn't
present, we shouldn't ICE trying to replace that REALPART_EXPR with
REALPART_EXPR of .U{ADD,SUB}C result.

2023-07-02 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/110508
* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with
REALPART_EXPR opf nlhs if re2 is non-NULL.

* gcc.dg/pr110508.c: New test.

xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well as TARGET_CLAMPS

Because both smin and smax requiring TARGET_MINMAX are essential to the
RTL representation.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p):
Simplify.
* config/xtensa/xtensa.md (*xtensa_clamps):
Add TARGET_MINMAX to the condition.

xtensa: Fix missing mode warning in "*eqne_INT_MIN"

gcc/ChangeLog:

* config/xtensa/xtensa.md (*eqne_INT_MIN):
Add missing ":SI" to the match_operator.

Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].

This support the -fconstant-cfstrings option as used by clang (and
expect by some build scripts) as an alias to the target-specific
-mconstant-cfstrings.

The documentation is also updated to reflect that the 'f' option is
only available on Darwin, and to add the 'm' option to the Darwin
section of the invocation text.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/108743

gcc/ChangeLog:

* config/darwin.opt: Add fconstant-cfstrings alias to
mconstant-cfstrings.
* doc/invoke.texi: Amend invocation descriptions to reflect
that the fconstant-cfstrings is a target-option alias and to
add the missing mconstant-cfstrings option description to the
Darwin section.

libphobos: Handle Darwin Arm and AArch64 in fibre context asm.

This code currently fails to build because it contains ELF-
specific directives. This patch excludes those directives when
the platform is Darwin.

We do not expect switching fibres between threads to be safe here
either owing to the possible caching of TLS pointers.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libphobos/ChangeLog:

* libdruntime/config/aarch64/switchcontext.S: Exclude ELF-
specific constructs for Darwin.
* libdruntime/config/arm/switchcontext.S: Likewise.
* libdruntime/core/thread/fiber.d: Disable switching fibres
between threads.

d: Add testcase from PR108962

The issue was fixed in r14-2232.

PR d/108962

gcc/testsuite/ChangeLog:

* gdc.dg/pr108962.d: New test.

d: Fix core.volatile.volatileLoad discarded if result is unused

The first pass of code generation in the D front-end splits up all
compound expressions and discards expressions that have no side effects.
This included calls to the `volatileLoad' intrinsic if its result was
not used, causing such calls to be eliminated from the program.

We already set TREE_THIS_VOLATILE on the expression, however the
tree documentation says if this bit is set in an expression, so is
TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too.
This prevents any early discarding from occuring.

PR d/110516

gcc/d/ChangeLog:

* intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the
expanded expression.
(expand_volatile_store): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr110516a.d: New test.
* gdc.dg/torture/pr110516b.d: New test.

Daily bump.

d: Fix accesses of immutable arrays using constant index still bounds checked

Starts setting TREE_READONLY against specific kinds of VAR_DECLs, so
that the middle-end/optimization passes can more aggressively constant
fold D code that makes use of `immutable' or `const'.

PR d/110514

gcc/d/ChangeLog:

* decl.cc (get_symbol_decl): Set TREE_READONLY on certain kinds of
const and immutable variables.
* expr.cc (ExprVisitor::visit (ArrayLiteralExp *)): Set TREE_READONLY
on immutable dynamic array literals.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110514a.d: New test.
* gdc.dg/pr110514b.d: New test.
* gdc.dg/pr110514c.d: New test.
* gdc.dg/pr110514d.d: New test.

libphobos, testsuite: Disable forkgc2 on Darwin [PR103944]

It hangs the testsuite (requiring manual intervention to kill the
spawned processes) which breaks CI. The reason for the hang id not
clear. This skips the test for now (xfail does not work).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR d/103944

libphobos/ChangeLog:

* testsuite/libphobos.gc/forkgc2.d: Skip for Darwin.

d: Don't generate code that throws exceptions when compiling with `-fno-exceptions'

The version flags for RTMI, RTTI, and exceptions was unconditionally
predefined.  These are now only predefined if the feature flag is
enabled.  It was noticed that there was no `-fexceptions' definition
inside d/lang.opt, so the detection of the exceptions option flag was
only partially working.  Once that was fixed, a few places in the
front-end implementation were found to fall fowl of `nothrow' rules,
these have been fixed upstream and backported here as well.

Reviewed-on: https://github.com/dlang/dmd/pull/15357
     https://github.com/dlang/dmd/pull/15360

PR d/110471

gcc/d/ChangeLog:

* d-builtins.cc (d_init_versions): Predefine D_ModuleInfo,
D_Exceptions, and D_TypeInfo only if feature is enabled.
* lang.opt: Add -fexceptions.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110471a.d: New test.
* gdc.dg/pr110471b.d: New test.
* gdc.dg/pr110471c.d: New test.

Add testcase from PR25623

gcc/testsuite/ChangeLog:

PR tree-optimization/25623
* gfortran.dg/pr25623.f90: New test.

Fix profile update in copy-header

Most common source of profile mismatches is now copyheader pass.  The reason is that
in comon case the duplicated header condition will become constant true and that needs
changes in the loop exit condition probability.

While this can be done by jump threading it is not, since it gives up on loops.
Copy header pass now has logic to prove that first exit will become true, so this
patch adds necessary pumbing to the profile updating.
This is done in gimple_duplicate_sese_region in a way that is specific for this
particular case.  I think general case is kind-of unsolvable and loop-ch is the
only user of the infrastructure.  If we later invent some new users, maybe we
can export the region and region_copy arrays and let user to do the update.

With the patch we now get:

Pass dump id and name            |static mismat|dynamic mismatch
                                 |in count     |in count
107t cunrolli                    |      3    +3|        19237       +19237
127t ch                          |     13   +10|        19237
131t dom                         |     39   +26|        19237
133t isolate-paths               |     47    +8|        19237
134t reassoc                     |     49    +2|        19237
136t forwprop                    |     53    +4|       226943      +207706
159t cddce                       |     61    +8|       242222       +15279
161t ldist                       |     62    +1|       242222
172t ifcvt                       |     66    +4|       415472      +173250
173t vect                        |    143   +77|     10859784    +10444312
176t cunroll                     |    294  +151|    150357763   +139497979
183t loopdone                    |    291    -3|    150289533       -68230
194t tracer                      |    322   +31|    153230990     +2941457
195t fre                         |    317    -5|    153230990
197t dom                         |    286   -31|    154448079     +1217089
199t threadfull                  |    293    +7|    154724763      +276684
200t vrp                         |    297    +4|    155042448      +317685
204t dce                         |    294    -3|    155017073       -25375
206t sink                        |    292    -2|    155017073
211t cddce                       |    298    +6|    155018657        +1584
255t optimized                   |    296    -2|    155018657
256r expand                      |    273   -23|    154592622      -426035
258r into_cfglayout              |    268    -5|    154592661          +39
275r loop2_unroll                |    272    +4|    159701866     +5109205
291r ce2                         |    270    -2|    159723509
312r pro_and_epilogue            |    290   +20|    159792505       +68996
315r jump2                       |    296    +6|    164234016     +4441511
323r bbro                        |    294    -2|    159385430     -4848586

So ch introduces 10 new mismatches while originally it did 308.  At bbro the
number of mismatches dropped from 432 to 294.
Most offender is now cunroll pass. I think it is the case where loop has multiple
exits and one of exits becomes to be false in all but last peeled iteration.

This is another case where non-trivial loop update is needed.

Honza

gcc/ChangeLog:

* tree-cfg.cc (gimple_duplicate_sese_region): Add elliminated_edge
parmaeter; update profile.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototype.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Rename to ...
(static_loop_exit): ... this; return the edge to be elliminated.
(ch_base::copy_headers): Handle profile updating for eliminated exits.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ifc-20040816-1.c: Reduce number of mismatches
from 2 to 1.
* gcc.dg/tree-ssa/loop-ch-profile-1.c: New test.
* gcc.dg/tree-ssa/loop-ch-profile-2.c: New test.

i386: Add STV support for DImode and SImode rotations by constant.

This patch implements scalar-to-vector (STV) support for DImode and SImode
rotations by constant bit counts.  Scalar rotations are almost always
optimal on x86, requiring only one or two instructions, but it is also
possible to implement these efficiently with SSE2, requiring only one
or two instructions for SImode rotations and at most 3 instructions for
DImode rotations.  This allows GCC to STV rotations with a small or no
penalty if there are other (net) benefits to converting a chain.  An
example of the benefits is shown below, which is based upon the BLAKE2
cryptographic hash function:

unsigned long long a,b,c,d;

unsigned long rot(unsigned long long x, int y)
{
  return (x<<y) | (x>>(64-y));
}

void foo()
{
  d = rot(d ^ a,32);
  c = c + d;
  b = rot(b ^ c,24);
  a = a + b;
  d = rot(d ^ a,16);
  c = c + d;
  b = rot(b ^ c,63);
}

where with -m32 -O2 -msse2

Before (59 insns, 247 bytes):

foo: pushl   %edi
        xorl    %edx, %edx
        pushl   %esi
        pushl   %ebx
        subl    $16, %esp
        movq    a, %xmm1
        movq    d, %xmm0
        movq    b, %xmm2
        pxor    %xmm1, %xmm0
        psrlq   $32, %xmm0
        movd    %xmm0, %eax
        movd    %edx, %xmm0
        movd    %eax, %xmm3
        punpckldq       %xmm0, %xmm3
        movq    c, %xmm0
        paddq   %xmm3, %xmm0
        pxor    %xmm0, %xmm2
        movd    %xmm2, %ecx
        psrlq   $32, %xmm2
        movd    %xmm2, %ebx
        movl    %ecx, %eax
        shldl   $24, %ebx, %ecx
        shldl   $24, %eax, %ebx
        movd    %ebx, %xmm4
        movd    %ecx, %xmm2
        punpckldq       %xmm4, %xmm2
        movdqa  .LC0, %xmm4
        pand    %xmm4, %xmm2
        paddq   %xmm2, %xmm1
        movq    %xmm1, a
        pxor    %xmm3, %xmm1
        movd    %xmm1, %esi
        psrlq   $32, %xmm1
        movd    %xmm1, %edi
        movl    %esi, %eax
        shldl   $16, %edi, %esi
        shldl   $16, %eax, %edi
        movd    %esi, %xmm1
        movd    %edi, %xmm3
        punpckldq       %xmm3, %xmm1
        pand    %xmm4, %xmm1
        movq    %xmm1, d
        paddq   %xmm1, %xmm0
        movq    %xmm0, c
        pxor    %xmm2, %xmm0
        movd    %xmm0, 8(%esp)
        psrlq   $32, %xmm0
        movl    8(%esp), %eax
        movd    %xmm0, 12(%esp)
        movl    12(%esp), %edx
        shrdl   $1, %edx, %eax
        xorl    %edx, %edx
        movl    %eax, b
        movl    %edx, b+4
        addl    $16, %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        ret

After (32 insns, 165 bytes):
        movq    a, %xmm1
        xorl    %edx, %edx
        movq    d, %xmm0
        movq    b, %xmm2
        movdqa  .LC0, %xmm4
        pxor    %xmm1, %xmm0
        psrlq   $32, %xmm0
        movd    %xmm0, %eax
        movd    %edx, %xmm0
        movd    %eax, %xmm3
        punpckldq       %xmm0, %xmm3
        movq    c, %xmm0
        paddq   %xmm3, %xmm0
        pxor    %xmm0, %xmm2
        pshufd  $68, %xmm2, %xmm2
        psrldq  $5, %xmm2
        pand    %xmm4, %xmm2
        paddq   %xmm2, %xmm1
        movq    %xmm1, a
        pxor    %xmm3, %xmm1
        pshuflw $147, %xmm1, %xmm1
        pand    %xmm4, %xmm1
        movq    %xmm1, d
        paddq   %xmm1, %xmm0
        movq    %xmm0, c
        pxor    %xmm2, %xmm0
        pshufd  $20, %xmm0, %xmm0
        psrlq   $1, %xmm0
        pshufd  $136, %xmm0, %xmm0
        pand    %xmm4, %xmm0
        movq    %xmm0, b
        ret

2023-07-01  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
gains/costs for ROTATE and ROTATERT (by an integer constant).
(general_scalar_chain::convert_rotate): New helper function to
convert a DImode or SImode rotation by an integer constant into
SSE vector form.
(general_scalar_chain::convert_insn): Call the new convert_rotate
for ROTATE and ROTATERT.
(general_scalar_to_vector_candidate_p): Consider ROTATE and
ROTATERT to be candidates if the second operand is an integer
constant, valid for a rotation (or shift) in the given mode.
* config/i386/i386-features.h (general_scalar_chain): Add new
helper method convert_rotate.

gcc/testsuite/ChangeLog
* gcc.target/i386/rotate-6.c: New test case.
* gcc.target/i386/sse2-stv-1.c: Likewise.

Fix update_bb_profile_for_threading

Fix profile some of profile mismatched caused by profile updating.
It seems that I misupdated update_bb_profile_for_threading in 2017 which
results in invalid updates from rtl threading and threadbackwards.
update_bb_profile_for_threading knows that some paths to BB are being
redirected elsehwere and those paths will exit from BB with E.  So it needs to
determine probability of the duplicated path and redistribute probablities.
For some reaosn however the conditonal probability of redirected path is
computed after its counts is subtracted which is wrong and often results in
probability greater than 100%.

I also fixed error mesage.  Compilling tramp3d I now get following passes
producing mismpatches:
Pass dump id and name            |static mismatcdynamic mismatch
                                 |in count     |in count
113t fre                         |      2    +2|            0
114t mergephi                    |      2      |            0
115t threadfull                  |      2      |            0
116t vrp                         |      2      |            0
127t ch                          |    307  +305|    347194302   +347194302
130t thread                      |    313    +6|    347221478       +27176
131t dom                         |    321    +8|    346841121      -380357
134t reassoc                     |    323    +2|    346841121
136t forwprop                    |    327    +4|    347026371      +185250
144t pre                         |    326    -1|    347040926       +14555
172t ifcvt                       |    338    +2|    347218249      +156280
173t vect                        |    409   +71|    356357418     +9139169
176t cunroll                     |    377   -32|    126071925   -230285493
183t loopdone                    |    376    -1|    126015489       -56436
194t tracer                      |    379    +3|    127258199     +1242710
197t dom                         |    375    -4|    128352165     +1093966
199t threadfull                  |    379    +4|    128526112      +173947
200t vrp                         |    381    +2|    128724673      +198561
204t dce                         |    374    -7|    128632495       -92178
206t sink                        |    370    -4|    128618043       -14452
211t cddce                       |    372    +2|    128632495       +14452
248t ehcleanup                   |    370    -2|    128618755       -13740
255t optimized                   |    362    -8|    128576810       -41945
256r expand                      |    356    -6|    128899768      +322958
258r into_cfglayout              |    353    -3|    129051765      +151997
259r jump                        |    354    +1|    129051765
262r cse1                        |    353    -1|    129051765
275r loop2_unroll                |    355    +2|    132182110     +3130345
277r loop2_done                  |    354    -1|    132182109           -1
312r pro_and_epilogue            |    371   +17|    132222324       +40215
323r bbro                        |    375    +4|    132095926      -126398

Without the patch at jump2 time we get over 432 mismatches, so 15%
improvement. Some of the mismathces are unavoidable.

I think ch mismatches are mostly due to loop header copying where the header
condition constant propagates.  Most common case should be threadable in early
optimizations and we also could do better on profile updating here.

Bootstrapped/regtested x6_64-linux, comitted.

gcc/ChangeLog:

PR tree-optimization/103680
* cfg.cc (update_bb_profile_for_threading): Fix profile update;
make message clearer.

gcc/testsuite/ChangeLog:

PR tree-optimization/103680
* gcc.dg/tree-ssa/pr103680.c: New test.
* gcc.dg/tree-prof/cmpsf-1.c: Un-xfail.

Daily bump.

c++: fix up caching of level lowered ttps

Due to level/depth mismatches between the template parameters of a level
lowered ttp and the original ttp, the ttp comparison check added by
r14-418-g0bc2a1dc327af9 never actually holds outside of erroneous cases.
Moreover, it'd be good to also cache the overall TEMPLATE_TEMPLATE_PARM
instead of only the TEMPLATE_PARM_INDEX.

It's tricky to cache all level lowered ttps since the result of level
lowering may depend on more than just the depth of the arguments, e.g.
for TT in

  template<class T>
  struct A {
    template<template<T> class TT> void f();
  }

the substitution T=int yields a different level-lowered ttp than T=char.
But these kinds of ttps seem to be rare in practice, and "simple" ttps
that don't depend on outer template parameters are easy enough to cache
like so.  Unfortunately, this means we're back to expecting a duplicate
error in nontype12.C again since the ttp in question isn't "simple" so
caching of the (erroneous) lowered ttp doesn't happen.

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_PARM_DESCENDANTS): Harden.
(TEMPLATE_TYPE_DESCENDANTS): Define.
(TEMPLATE_TEMPLATE_PARM_SIMPLE_P): Define.
* pt.cc (reduce_template_parm_level): Revert
r14-418-g0bc2a1dc327af9 change.
(process_template_parm): Set TEMPLATE_TEMPLATE_PARM_SIMPLE_P
appropriately.
(uses_outer_template_parms): Determine the outer depth of
a template template parm without relying on DECL_CONTEXT.
(tsubst) <case TEMPLATE_TEMPLATE_PARM>: Cache lowering a
simple template template parm.  Consistently use 'code'.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Refine and XFAIL the dg-bogus
duplicate diagnostic check.

Use TYPE_INCLUDES_FLEXARRAY in __builtin_object_size [PR tree-optimization/101832]

__builtin_object_size should treat struct with TYPE_INCLUDES_FLEXARRAY as
flexible size.

gcc/ChangeLog:

PR tree-optimization/101832
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.

Fix couple of endianness issues in fold_ctor_reference

fold_ctor_reference attempts to use a recursive local processing in order
to call native_encode_expr on the leaf nodes of the constructor, before
falling back to calling native_encode_initializer if this fails.

There are a couple of issues related to endianness present in it:
  1) it does not specifically handle integral bit-fields; now these are left
justified on big-endian platforms so cannot be treated like ordinary fields.
  2) it does not check that the constructor uses the native storage order.

gcc/
* gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
(fold_nonarray_ctor_reference): Likewise.  Specifically deal
with integral bit-fields.
(fold_ctor_reference): Make sure that the constructor uses the
native storage order.

gcc/testsuite/
* gcc.c-torture/execute/20230630-1.c: New test.
* gcc.c-torture/execute/20230630-2.c: Likewise.
* gcc.c-torture/execute/20230630-3.c: Likewise
* gcc.c-torture/execute/20230630-4.c: Likewise

jit.exp: handle dwarf version mismatch in jit-check-debug-info [PR110466]

gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/jit.exp (jit-check-debug-info): Gracefully handle too
early versions of gdb that don't support our dwarf version, via
"unsupported".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

jit: avoid using __vector in testcase [PR110466]

r13-4531-gd2e782cb99c311 added test coverage to libgccjit's vector
support, but used __vector, which doesn't work on Power. Additionally
the size param to gcc_jit_type_get_vector was wrong.

Fixed thusly.

gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/test-expressions.c (run_test_of_comparison): Fix size
param to gcc_jit_type_get_vector.
(verify_comparisons): Use a typedef rather than __vector.

Co-authored-by: Marek Polacek <polacek@redhat.com>
Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Fix iostream init for Clang on darwin [PR110432]

The __has_attribute(init_priority) check in <iostream> is true for Clang
on darwin, which means that user code including <iostream> thinks the
library will initialize the global streams. However, when libstdc++ is
built by GCC on darwin, the __has_attribute(init_priority) check is
false, which means that the library thinks that user code will do the
initialization when <iostream> is included. This means that the
initialization is never done.

Add an autoconf check so that the header and the library both make their
decision based on the static properties of GCC at build time, with a
consistent outcome.

As a belt and braces check, also do the initialization in <iostream> if
the compiler including that header doesn't support the attribute (even
if the library also containers the initialization). This might result in
redundant initialization done in <iostream>, but ensures the
initialization happens somewhere if there's any doubt about the
attribute working correctly due to missing linker support.

libstdc++-v3/ChangeLog:

PR libstdc++/110432
* acinclude.m4 (GLIBCXX_CHECK_INIT_PRIORITY): New.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_INIT_PRIORITY.
* include/std/iostream: Use new autoconf macro as well as
__has_attribute.
* src/c++98/ios_base_init.h: Use new autoconf macro instead of
__has_attribute.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Fix unused warning for new variable

This newly-introduced variable isn't used on all paths, so add the
[[maybe_unused]] attribute.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (random_device::_M_init): Add maybe_unused
attribute.

Fix handling of __builtin_expect_with_probability and improve first-match heuristics

While looking into the std::vector _M_realloc_insert codegen I noticed that
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it.  This
incorrectly takes precedence over more reliable heuristics predicting that call
to cold noreturn is likely not going to happen.

So I reordered the predictors so __builtin_expect_with_probability comes first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand).  I also downgraded malloc predictor since I do
think user-defined malloc functions & new operators may behave funny ways and
moved usual __builtin_expect after the noreturn cold predictor.

This triggered latent bug in expr_expected_value_1 where

  if (*predictor < predictor2)
    *predictor = predictor2;

should be:

  if (predictor2 < *predictor)
    *predictor = predictor2;

which eventually triggered an ICE on combining heuristics.  This made me notice
that we can do slightly better while combining expected values in case only
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.

Note that the new code may pick weaker heuristics in case that both values are
predicted.  Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.

Fixing this issue uncovered another problem.  In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he predicts
that it returns true (boolean 1).  This sort of works for testcase testing
malloc (10) != NULL
but, for example, we will predict
malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code

I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.

gcc/ChangeLog:

PR middle-end/109849
* predict.cc (estimate_bb_frequencies): Turn to static function.
(expr_expected_value_1): Fix handling of binary expressions with
predicted values.
* predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority
queue.
* predict.h (estimate_bb_frequencies): No longer declare it.

gcc/testsuite/ChangeLog:

PR middle-end/109849
* gcc.dg/predict-18.c: Improve testcase.

modula-2: Amend the handling of failed select() calls in RTint [PR108835].

When we make a select() that fails, there is an attempt to (a) diagnose
why and (b) make a fallback. These actions are causing some tests to
hang on some Darwin versions, this is because the first action that is
tried to assist in diagnosis/fallback handling is to replace the set
timeout with NIL (which causes select to wait forever, modulo other
reasons it might complete).

To fix this, call select with a zero timeout when checking for error
conditions. Also, as we check the possible failure conditions, if we
find a change that succeeds, then stop looking for errors.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR testsuite/108835

gcc/m2/ChangeLog:

* gm2-libs/RTint.mod: Do not use NIL timeout setting on select,
test failures sequentially, finishing on the first success.

libstdc++: Make std::random_device throw more std::system_error [PR105081]

In r14-289-gf9412cedd6c0e7 I made the std::random_device constructor
throw std::system_error for unrecognized tokens. But it still throws
std::runtime_error for a token such as "rdseed" that is recognized but
not supported at runtime by the CPU the program is running on.

With this change we throw std::system_error for those cases too. This
fixes the following failures on Intel CPUs withour rdseed support:

FAIL: 26_numerics/random/random_device/94087.cc execution test
FAIL: 26_numerics/random/random_device/cons/token.cc execution test
FAIL: 26_numerics/random/random_device/entropy.cc execution test

libstdc++-v3/ChangeLog:

PR libstdc++/105081
* src/c++11/random.cc (random_device::_M_init): Throw
std::system_error when the requested device is a valid token but
not available at runtime.

fold-const+optabs: Change return type of predicate functions from int to bool

Also change some internal variables and function argument from int to bool.

gcc/ChangeLog:

* fold-const.h (multiple_of_p): Change return type from int to bool.
* fold-const.cc (split_tree): Change negl_p, neg_litp_p,
neg_conp_p and neg_var_p variables to bool.
(const_binop): Change sat_p variable to bool.
(merge_ranges): Change no_overlap variable to bool.
(extract_muldiv_1): Change same_p variable to bool.
(tree_swap_operands_p): Update function body for bool return type.
(fold_truth_andor): Change commutative variable to bool.
(multiple_of_p): Change return type
from int to void and adjust function body accordingly.
* optabs.h (expand_twoval_unop): Change return type from int to bool.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
* optabs.cc (add_equal_note): Ditto.
(widen_operand): Change no_extend argument from int to bool.
(expand_binop): Ditto.
(expand_twoval_unop): Change return type
from int to void and adjust function body accordingly.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.

AArch64: New RTL for ABDL

This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>):
Expansions for abd vec widen optabs.
(aarch64_<su>abdl<mode>_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.

Mid engine setup [SU]ABDL

This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).

gcc/ChangeLog:

* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* doc/md.texi: Document them.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.

Regenrate lto-plugin/Makefile.in

Commit regenerated lto-plugin/Makefile.in in order to reflect changes
introduction of --enable-host-pie.

lto-plugin/ChangeLog:

2023-06-30 Martin Jambor <mjambor@suse.cz>

* Makefile.in: Regenerate.

RISC-V: Refactor vxrm_mode attr for type attr equal

This patch would like to refactor the vxrm_mode attr for duplicated
eq_attr condition. The common condition of attr is extraced to one
place instead of many places.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector.md: Refactor the common condition.

tree-optimization/110496 - TYPE_PRECISION issue with store-merging

When store-merging looks for bswap opportunities we also handle
BIT_FIELD_REFs where we verify the refed object is of scalar
type but we don't check for the result type we eventually use.
That's done later but after we eventually query TYPE_PRECISION.
The following re-orders this.

PR tree-optimization/110496
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Re-order
verifying and TYPE_PRECISION query for the BIT_FIELD_REF case.

* gcc.dg/pr110496.c: New testcase.

MAINTAINERS file: Added myself to Write After Approval and DCO

ChangeLog:

2023-06-30 Rishi Raj <rishiraj45035@gmail.com>

* MAINTAINERS: Added myself to Write After Approval and DCO

middle-end/110489 - avoid useless work on statistics

When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it. When a TU has many small
functions this can take considerable time. The following avoids
this by never allocating the hash from this function.

PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.

Flip the nvptx port to LRA

... understanding that "turn on LRA" is an exaggeration here, given that nvptx
isn't actually doing register allocation ('TARGET_NO_REGISTER_ALLOCATION').

gcc/
* config/nvptx/nvptx.cc (TARGET_LRA_P): Remove.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>

tree-optimization/110381 - fix testcase

This adds a missing check_vect () to the execute testcase.

PR tree-optimization/110381
* gcc.dg/vect/pr110381.c: Add check_vect ().

libstdc++: Re-apply PR108672 fix (avoid use of naked int32_t in unseq_backend_simd.h)

The fix was overwritten by r14-2109-g3162ca09dbdc2e "libstdc++:
Synchronize PSTL with upstream".

libstdc++-v3:

PR libstdc++/108672
* include/pstl/unseq_backend_simd.h (__simd_or): Re-apply using
__INT32_TYPE__ instead of int32_t.

mips: Fix overaligned function arguments [PR109435]

This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.

This change makes it impossible for a typedef type to have
alignment that is less than its size.

2023-06-27 Jovan Dmitrović <jovan.dmitrovic@syrmia.com>

gcc/ChangeLog:

PR target/109435
* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/mips/align-1-n64.c: New test.
* gcc.target/mips/align-1-o32.c: New test.

Daily bump.

analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]

g++.dg/analyzer/PR100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon
out-of-bounds read that was preventing further checks. Now instead,
a boolean value check_poisoned goes to false when a OOB is detected,
and is later on given to get_or_create_initial_value.

gcc/analyzer/ChangeLog:
PR analyzer/110198
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer returns
on OOB, but rather gives a boolean to get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.

Signed-off-by: benjamin priour <priour.be@gmail.com>

Compute ipa-predicates for conditionals involving __builtin_expect_p

std::vector allocator looks as follows:

__attribute__((nodiscard))
struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int> >::allocate (struct __new_allocator * const this, size_type __n, const void * D.27753)
{
  bool _1;
  long int _2;
  long int _3;
  long unsigned int _5;
  struct pair * _9;

  <bb 2> [local count: 1073741824]:
  _1 = __n_7(D) > 1152921504606846975;
  _2 = (long int) _1;
  _3 = __builtin_expect (_2, 0);
  if (_3 != 0)
    goto <bb 3>; [10.00%]
  else
    goto <bb 6>; [90.00%]

  <bb 3> [local count: 107374184]:
  if (__n_7(D) > 2305843009213693951)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 53687092]:
  std::__throw_bad_array_new_length ();

  <bb 5> [local count: 53687092]:
  std::__throw_bad_alloc ();

  <bb 6> [local count: 966367641]:
  _5 = __n_7(D) * 8;
  _9 = operator new (_5);
  return _9;
}

So there is check for allocated block size being greater than max_size which is
wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to __throw
will be optimized out if __n is larady smaller than 1152921504606846975 which
it is after _M_check_len.

This patch extends ipa-fnsummary to understand functions that return their
parameter.

gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.

testsuite: Use -fno-report-bug in gcc.dg/plugin/

Certain downstream compilers (for example, in Fedora) default to
-freport-bug.  The extra output breaks the following tests.  We can use
-fno-report-bug to fix that.  Patch verified with:

$ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\} plugin.exp'

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/crash-test-ice-sarif.c: Use -fno-report-bug.  Adjust
scan-sarif-file.
* gcc.dg/plugin/crash-test-ice-stderr.c: Use -fno-report-bug.
* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Use
-fno-report-bug.  Adjust scan-sarif-file.
* gcc.dg/plugin/crash-test-write-though-null-stderr.c: Use
-fno-report-bug.

i386: add -fno-stack-protector to two tests

These tests fail when the testsuite is executed with -fstack-protector-strong.
To avoid this, this patch adds -fno-stack-protector to dg-options.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104610.c: Use -fno-stack-protector.
* gcc.target/i386/pr69482-1.c: Likewise.

c++: NSDMI instantiation during overload resolution [PR110468]

Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set. The flag causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
which eventually leads to an ICE from nothrow_spec_p.

This patch fixes this by clearing any special tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.

PR c++/110468

gcc/cp/ChangeLog:

* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept79.C: New test.

c++: unpropagated CONSTRUCTOR_MUTABLE_POISON [PR110463]

Here we're incorrectly accepting the mutable member accesses because
cp_fold neglects to propagate CONSTRUCTOR_MUTABLE_POISON when folding a
CONSTRUCTOR.

PR c++/110463

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) <case CONSTRUCTOR>: Propagate
CONSTRUCTOR_MUTABLE_POISON.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable6.C: New test.

Update documentation to clarify a GCC extension [PR c/77650]

on a structure with a C99 flexible array member being nested in
another structure.

"The GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

There are two situations:

   * A structure containing a C99 flexible array member, or a union
     containing such a structure, is the last field of another structure,
     for example:

          struct flex  { int length; char data[]; };
          union union_flex { int others; struct flex f; };

          struct out_flex_struct { int m; struct flex flex_data; };
          struct out_flex_union { int n; union union_flex flex_data; };

     In the above, both 'out_flex_struct.flex_data.data[]' and
     'out_flex_union.flex_data.f.data[]' are considered as flexible
     arrays too.

   * A structure containing a C99 flexible array member, or a union
     containing such a structure, is not the last field of another structure,
     for example:

          struct flex  { int length; char data[]; };

          struct mid_flex { int m; struct flex flex_data; int n; };

     In the above, accessing a member of the array 'mid_flex.flex_data.data[]'
     might have undefined behavior.  Compilers do not handle such a case
     consistently, Any code relying on this case should be modified to ensure
     that flexible array members only end up at the ends of structures.

     Please use the warning option '-Wflex-array-member-not-at-end' to
     identify all such cases in the source code and modify them.  This extension
     is now deprecated.
"

PR c/77650

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.

Introduce IR bit TYPE_INCLUDES_FLEXARRAY for the GCC extension

on a structure with a C99 flexible array member being nested in
another structure

GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively) as the last
field.
This patch is to introduce the IR bit TYPE_INCLUDES_FLEXARRAY (reuse the
existing IR bit TYPE_NO_NAMED_ARGS_SATDARG_P), set it correctly in C FE,
stream it correctly in Middle-end, and print it during IR dumping.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

* print-tree.cc (print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.

Move maybe_set_nonzero_bits() to its only user.

gcc/ChangeLog:

* tree-vrp.cc (maybe_set_nonzero_bits): Move from here...
* tree-ssa-dom.cc (maybe_set_nonzero_bits): ...to here.
* tree-vrp.h (maybe_set_nonzero_bits): Remove.

Tidy up the range normalization code.

There's a few spots where a range is being altered in-place, but we
fail to call normalize the range. This patch makes sure we always
call normalize_kind(), and that normalize_kind in turn calls
verify_range to make sure verything is canonical.

gcc/ChangeLog:

* value-range.cc (frange::set): Do not call verify_range.
(frange::normalize_kind): Verify range.
(frange::union_nans): Do not call verify_range.
(frange::union_): Same.
(frange::intersect): Same.
(irange::irange_single_pair_union): Call normalize_kind if
necessary.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_range_from_nonzero_bits): Verify range.
(irange::set_nonzero_bits): Call normalize_kind if necessary.
(irange::get_nonzero_bits): Tweak comment.
(irange::intersect_nonzero_bits): Call normalize_kind if
necessary.
(irange::union_nonzero_bits): Same.
* value-range.h (irange::normalize_kind): Verify range.

cselib+expr+bitmap: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* cselib.h (rtx_equal_for_cselib_1):
Change return type from int to bool.
(references_value_p): Ditto.
(rtx_equal_for_cselib_p): Ditto.
* expr.h (can_store_by_pieces): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(safe_from_p): Ditto.
* sbitmap.h (bitmap_equal_p): Ditto.
* cselib.cc (references_value_p): Change return type
from int to void and adjust function body accordingly.
(rtx_equal_for_cselib_1): Ditto.
* expr.cc (is_aligning_offset): Ditto.
(can_store_by_pieces): Ditto.
(mostly_zeros_p): Ditto.
(all_zeros_p): Ditto.
(safe_from_p): Ditto.
(is_aligning_offset): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(store_constructor): Change "need_to_clear" and
"const_bounds_p" variables to bool.
* sbitmap.cc (bitmap_equal_p): Change return type from int to bool.

libstdc++: Fix src/c++20/tzdb.cc for non-constexpr std::mutex

Building libstdc++ reportedly fails for targets without lock-free
std::atomic<T*> which don't define __GTHREAD_MUTEX_INIT:

src/c++20/tzdb.cc:110:21: error: 'constinit' variable 'std::chrono::{anonymous}::list_mutex' does not have a constant initializer
src/c++20/tzdb.cc:110:21: error: call to non-'constexpr' function 'std::mutex::mutex()'

The solution implemented by this commit is to use a local static mutex
when it can't be constinit, so that it's constructed on first use.

With this change, we can also simplify the preprocessor logic for
defining USE_ATOMIC_SHARED_PTR. It now depends on the same conditions as
USE_ATOMIC_LIST_HEAD, so in theory we could have a single macro. Keeping
them separate would allow us to replace the use of atomic<shared_ptr<T>>
with a mutex if that performs better, without having to give up on the
lock-free cache for fast access to the list head.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (USE_ATOMIC_SHARED_PTR): Define consistently
with USE_ATOMIC_LIST_HEAD.
(list_mutex): Replace global object with function. Use local
static object when std::mutex constructor isn't constexpr.

libstdc++: Do not use off64_t in calls to copy_file_range [PR110462]

Although the copy_file_range(2) man page shows the arguments as off64_t*
that is not portable. For musl there is no off64_t type, as off_t is
always 64-bit. Use the loff_t type which is always 64-bit even if off_t
isn't. We could just use off_t because the filesystem library is
compiled with _FILE_OFFSET_BITS=64, but loff_t is the more correct type
for this interface.

libstdc++-v3/ChangeLog:

PR libstdc++/110462
* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check that
copy_file_range can be called with loff_t* arguments.
* configure: Regenerate.
* src/filesystem/ops-common.h (copy_file_copy_file_range):
Use loff_t for offsets.

c++: cache partial template specialization selection

There's currently no cheap way to obtain the partial template
specialization (and arguments relative to it) that was selected for a
class or variable template specialization.  Our only option is to
compute the result from scratch via most_specialized_partial_spec.

For class templates this isn't really an issue because we usually need
this information just once, upon instantiation.  But for variable
templates we need it upon specialization and also later upon instantiation.
We could implement an ad-hoc cache for variable templates only, but it'd
be nice for this information to be readily available in general.

To that end, this patch adds a TI_PARTIAL_INFO field to TEMPLATE_INFO
that holds another TEMPLATE_INFO consisting of the partial template and
arguments relative to it, which most_specialized_partial_spec then
uses to transparently cache its (now TEMPLATE_INFO) result.

Similarly, there's no easy way to go from the DECL_TEMPLATE_RESULT of a
partial TEMPLATE_DECL back to that TEMPLATE_DECL.  (Our best option is to
walk the DECL_TEMPLATE_SPECIALIZATIONS list of the primary TEMPLATE_DECL.)
So this patch also uses this new field to link these entities in both
directions.

gcc/cp/ChangeLog:

* cp-tree.h (tree_template_info::partial): New data member.
(TI_PARTIAL_INFO): New tree accessor.
(most_specialized_partial_spec): Add defaulted bool parameter.
* module.cc (trees_out::core_vals) <case TEMPLATE_INFO>: Stream
TI_PARTIAL_INFO.
(trees_in::core_vals) <case TEMPLATE_INFO>: Likewise.
* parser.cc (specialization_of): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead
of TREE_LIST.
* pt.cc (process_partial_specialization): Set TI_PARTIAL_INFO
of 'decl' to point back to the partial TEMPLATE_DECL.  Likewise
(and pass rechecking=true to most_specialization_partial_spec).
(instantiate_class_template): Likewise.
(instantiate_template): Set TI_PARTIAL_INFO to the result of
most_specialization_partial_spec after forming a variable
template specialization.
(most_specialized_partial_spec): Add 'rechecking' parameter.
Exit early if the template is not primary.  Use the TI_PARTIAL_INFO
of the corresponding TEMPLATE_INFO as a cache unless 'rechecking'
is true.  Don't bother setting TREE_TYPE of each TREE_LIST.
(instantiate_decl): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead of
TREE_LIST.
* ptree.cc (cxx_print_xnode) <case TEMPLATE_INFO>: Dump
TI_PARTIAL_INFO.

Relax type-printer regexp in libstdc++ test suite

The libstdc++ test suite checks whether gdb type printers are
available like so:

    set do_whatis_tests [gdb_batch_check "python print(gdb.type_printers)" \
   "\\\[\\\]"]

This regexp assumes that the list of printers is empty.  However,
sometimes it's convenient to ship a gdb that comes with some default
printers, causing this to erroneously report that gdb is "too old".

I believe the intent of this check is to ensure that gdb.type_printers
exists -- not to check its starting value.  This patch changes the
check to accept any Python list as output.

Note that the patch doesn't look for the trailing "]".  I tried this
but in my case the output was too long for expect.  It seemed fine to
just check the start, as the point really is to reject the case where
the command prints an error message.

libstdc++-v3/ChangeLog

* testsuite/lib/gdb-test.exp (gdb-test): Relax type-printer
regexp.

tree-ssa-math-opts: Use element_precision.

The recent TYPE_PRECISION changes to detect improper usage
cause an ICE in divmod_candidate_p for RVV when called with
a vector type. Therefore, use element_precision instead.

gcc/ChangeLog:

* tree-ssa-math-opts.cc (divmod_candidate_p): Use
element_precision.

[Committed] Add -mmove-max=128 -mstore-max=128 to pieces-memcmp-2.c

Adding -mmove-max=128 and -mstore-max=128 to the dg-options of the
recently added gcc.target/i386/pieces-memcmp-2.c avoids changing the
intent of this testcase when adding -march=cascadelake to RUNTESTFLAGS.
Committed as obvious.

2023-06-29 Roger Sayle <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: Specify that 128-bit
comparisons are desired, to see if 256-bit instructions are
generated inappropriately (fixes test on -march=cascadelake).

tree-optimization/110460 - fend off vector types from vectorizer

The following makes fending off existing vector types from vectorization
also apply to word_mode vector types. I've chosen to add a positive
list of allowed scalar types here for clarity.

PR tree-optimization/110460
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Only allow integral, pointer and scalar float type scalar_type.

Avoid adding loop-carried ops to long chains

Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-carried ops in a
separate chain.
   E.g.
   x_1 = phi(x_0, x_2)
   y_1 = phi(y_0, y_2)

   a + b + c + d + e + x1 + y1

   SSA1 = a + b;
   SSA2 = c + d;
   SSA3 = SSA1 + e;
   SSA4 = SSA3 + SSA2;
   SSA5 = x1 + y1;
   SSA6 = SSA4 + SSA5;

With the patch applied, these test cases improved by 32%~100%.

S242:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}

Case 1:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

Case 2:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

The value is the execution time
A: original version
B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
C: with current patch(base on B)

  A   B   C B/A         C/A
s242 2.859 5.152 2.859 1.802028681 1
case 1 5.489 5.488 3.511 0.999818 0.64
case 2 7.216 7.499 4.885 1.039218 0.68

gcc/ChangeLog:

PR tree-optimization/110148
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle loop-carried
ops in this function.

[testsuite] tolerate enabled but missing language frontends

When a language is enabled but we run the testsuite against a tree in
which the frontend compiler is not present, help.exp fails.  It
recognizes the output pattern for a disabled language, but not a
missing frontend.  Extend the pattern so that it covers both cases.

for  gcc/testsuite/ChangeLog

* lib/options.exp (check_for_options_with_filter): Handle
missing frontend compiler like disabled language.

middle-end/110452 - bad code generation with AVX512 mask splat

The following adds an alternate way of expanding a uniform
mask vector constructor like

  _55 = _2 ? -1 : 0;
  vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55};

when the mask mode is a scalar int mode like for AVX512 or GCN.
Instead of piecewise building the result via shifts and ors
we can take advantage of uniformity and signedness of the
component and simply sign-extend to the result.

Instead of

        cmpl    $3, %edi
        sete    %cl
        movl    %ecx, %esi
        leal    (%rsi,%rsi), %eax
        leal    0(,%rsi,4), %r9d
        leal    0(,%rsi,8), %r8d
        orl     %esi, %eax
        orl     %r9d, %eax
        movl    %ecx, %r9d
        orl     %r8d, %eax
        movl    %ecx, %r8d
        sall    $4, %r9d
        sall    $5, %r8d
        sall    $6, %esi
        orl     %r9d, %eax
        orl     %r8d, %eax
        movl    %ecx, %r8d
        orl     %esi, %eax
        sall    $7, %r8d
        orl     %r8d, %eax
        kmovb   %eax, %k1

we then get

        cmpl    $3, %edi
        sete    %cl
negl    %ecx
kmovb   %ecx, %k1

Code generation for non-uniform masks remains bad, but at least
I see no easy way out for the most general case here.

PR middle-end/110452
* expr.cc (store_constructor): Handle uniform boolean
vectors with integer mode specially.

middle-end/110461 - pattern applying wrongly to vectors

The following guards a match.pd pattern that wasn't supposed to
apply to vectors and thus runs into TYPE_PRECISION checking. For
vector support the constant case is lacking and the pattern would
have missing optab support checking for the result operation.

PR middle-end/110461
* match.pd (bitop (convert@2 @0) (convert?@3 @1)): Disable
for VECTOR_TYPE_P.

* gcc.dg/pr110461.c: New testcase.

c/110454 - ICE with bogus TYPE_PRECISION use

The following sinks TYPE_PRECISION to properly guarded use places.

PR c/110454
gcc/c/
* c-typeck.cc (convert_argument): Sink formal_prec compute
to where TYPE_PRECISION is valid to use.

gcc/testsuite/
* gcc.dg/Wtraditional-conversion-3.c: New testcase.

A couple of va_gc_atomic tweaks

The only current user of va_gc_atomic is Ada's:

    vec<Entity_Id, va_gc_atomic>

It uses the generic gt_pch_nx routines (with gt_pch_nx being the
“note pointers” hooks), such as:

    template<typename T, typename A>
    void
    gt_pch_nx (vec<T, A, vl_embed> *v)
    {
      extern void gt_pch_nx (T &);
      for (unsigned i = 0; i < v->length (); i++)
gt_pch_nx ((*v)[i]);
    }

It then defines gt_pch_nx routines for Entity_Id &.

The problem is that if we wanted to take the same approach for
an array of unsigned ints, we'd need to define:

    inline void gt_pch_nx (unsigned int &) { }

which would then be ambiguous with:

    inline void gt_pch_nx (unsigned int) { }

The point of va_gc_atomic is that the elements don't need to be GCed,
and so we have:

    template<typename T>
    void
    gt_ggc_mx (vec<T, va_gc_atomic, vl_embed> *v ATTRIBUTE_UNUSED)
    {
      /* Nothing to do.  Vectors of atomic types wrt GC do not need to
be traversed.  */
    }

I think it's therefore reasonable to assume that no pointers will
need to be processed for PCH either.

The patch also relaxes the array_slice constructor for vec<T, va_gc> *
so that it handles all embedded vectors.

gcc/
* vec.h (gt_pch_nx): Add overloads for va_gc_atomic.
(array_slice): Relax va_gc constructor to handle all vectors
with a vl_embed layout.

gcc/ada/
* gcc-interface/decl.cc (gt_pch_nx): Remove overloads for Entity_Id.

RISC-V: Support vfadd static rounding mode by mode switching

This patch would like to support the vfadd static round mode similar to
the fixed-point. Then the related fsrm instructions will be inserted
correlatively.

Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
it will be implemented in the underlying PATCH(s).

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
(riscv_mode_needed): Likewise.
(riscv_entity_mode_after): Likewise.
(riscv_mode_after): Likewise.
(riscv_mode_entry): Likewise.
(riscv_mode_exit): Likewise.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
for FRM.
* config/riscv/riscv.md: Add FRM register.
* config/riscv/vector-iterators.md: Add FRM type.
* config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
(fsrm): Define new insn for fsrm instruction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.

RISC-V: Allow rounding mode control for RVV floating-point add

According to the doc as below, we need to support the rounding mode of
the RVV floating-point, both the static and dynamice frm.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

For tracking and development friendly, We will take some steps to support
all rounding modes for the RVV floating-point rounding modes.

1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
2. Support static rounding mode control by mode switch, like fixed-point.
3. Support dynamice round mode control by mode switch.
4. Support the rest floating-point instructions for frm.

Please *NOTE* this patch only allow the rounding mode control for the
vfadd intrinsic API, and the related frm will be coverred by step 2.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
Add macro for static frm min and max.
* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): New class for floating-point with frm.
(BASE): Add vfadd for frm.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfadd_frm): Likewise.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): New struct for alu with frm.
(SHAPE): Add alu with frm.
* config/riscv/riscv-vector-builtins-shapes.h: Likewise.
* config/riscv/riscv-vector-builtins.cc
(function_checker::report_out_of_range_and_not): New function
for report out of range and not val.
(function_checker::require_immediate_range_or): New function
for checking in range or one val.
* config/riscv/riscv-vector-builtins.h: Add function decl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm.c: New test.

x86: Update model values for Alderlake, Rocketlake and Raptorlake.

Update model values for Alderlake, Rocketlake and Raptorlake according to SDM.

gcc/ChangeLog

* common/config/i386/cpuinfo.h (get_intel_cpu): Remove model value 0xa8
from Rocketlake, move model value 0xbf from Alderlake to Raptorlake.

Fix collection and processing of autoprofile data for target libs

cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
debug info since they are used to build target libs. -gtoggle was
turning off debug info for this stage.

create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
processing profile data collected while building target libraries.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Remove -gtoggle for STAGEautoprofile
* Makefile.tpl: Remove -gtoggle for STAGEautoprofile

gcc/c/ChangeLog:

* Make-lang.in: Pass correct stage cc1 when processing
profile data collected while building target libraries

gcc/cp/ChangeLog:

* Make-lang.in: Pass correct stage cc1plus when processing
profile data collected while building target libraries

gcc/lto/ChangeLog:

* Make-lang.in: Pass correct stage lto when processing
profile data collected while building target libraries

Daily bump.

testsuite: check_effective_target_lra: CRIS is LRA

Left-over from r14-383-gfaf8bea79b6256.

* lib/target-supports.exp (check_effective_target_lra): Remove
cris-*-* from expression for exceptions to LRA.

CRIS: Don't apply PATTERN to insn before validation (PR 110144)

Oops. The validation was there, but PATTERN was applied
before that. Noticeable only with rtl-checking (for example
as in the report: "--enable-checking=yes,rtl") as this
statement was only a (one of many) straggling olde-C
declare-and-initialize-at-beginning-of-block thing.

PR target/110144
* config/cris/cris.cc (cris_postdbr_cmpelim): Don't apply PATTERN
to insn before validating it.

Enable early inlining into always_inline functions

Early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder. This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite common
in libstdc++.

This patch instead of fully disabling the inline checks calls in callee.
I am quite conservative about what can be inlined as this patch is bit
touchy anyway. To avoid problems with always_inline being optimized
after early inline I extended inline_always_inline_functions to lazilly
compute fnsummary when needed.

gcc/ChangeLog:

PR middle-end/110334
* ipa-fnsummary.h (ipa_fn_summary): Add
safe_to_inline_to_always_inline.
* ipa-inline.cc (can_early_inline_edge_p): ICE
if SSA is not built; do cycle checking for
always_inline functions.
(inline_always_inline_functions): Be recrusive;
watch for cycles; do not updat overall summary.
(early_inliner): Do not give up on always_inlines.
* ipa-utils.cc (ipa_reverse_postorder): Do not skip
always inlines.

gcc/testsuite/ChangeLog:

PR middle-end/110334
* g++.dg/opt/pr66119.C: Disable early inlining.
* gcc.c-torture/compile/pr110334.c: New test.
* gcc.dg/tree-ssa/pr110334.c: New test.

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): For non-constant string
argument passed to CHARACTER(LEN=1),VALUE dummy, ensure proper
dereferencing and truncation of string to length 1.

gcc/testsuite/ChangeLog:

PR fortran/110360
* gfortran.dg/value_9.f90: Add tests for intermediate regression.

c++: ahead of time variable template-id coercion [PR89442]

This patch makes us coerce the arguments of a variable template-id ahead
of time, as we do for class template-ids, which causes us to immediately
diagnose template parm/arg kind mismatches and arity mismatches.

Unfortunately this causes a regression in cpp1z/constexpr-if20.C: coercing
the variable template-id m<ar, as> ahead of time means we strip it of
typedefs, yielding m<typename C<i>::q, typename C<j>::q>, but in this
stripped form we're directly using 'i' and so we expect to have captured
it. This is a variable template version of PR107437.

PR c++/89442
PR c++/107437

gcc/cp/ChangeLog:

* cp-tree.h (lookup_template_variable): Add complain parameter.
* parser.cc (cp_parser_template_id): Pass tf_warning_or_error
to lookup_template_variable.
* pt.cc (lookup_template_variable): Add complain parameter.
Coerce template arguments here ...
(finish_template_variable): ... instead of here.
(lookup_and_finish_template_variable): Check for error_mark_node
result from lookup_template_variable.
(tsubst_copy) <case TEMPLATE_ID_EXPR>: Pass complain to
lookup_template_variable.
(instantiate_template): Use build2 instead of
lookup_template_variable to build a TEMPLATE_ID_EXPR
for most_specialized_partial_spec.

gcc/testsuite/ChangeLog:

* g++.dg/cpp/pr64127.C: Expect "expected unqualified-id at end
of input" error.
* g++.dg/cpp0x/alias-decl-ttp1.C: Fix template parameter/argument
kind mismatch for variable template has_P_match_V.
* g++.dg/cpp1y/pr72759.C: Expect "template argument 1 is invalid"
error.
* g++.dg/cpp1z/constexpr-if20.C: XFAIL test due to bogus "'i' is
not captured" error.
* g++.dg/cpp1z/noexcept-type21.C: Fix arity of variable template d.
* g++.dg/diagnostic/not-a-function-template-1.C: Add default
template argument to variable template A so that A<> is valid.
* g++.dg/parse/error56.C: Don't expect "ISO C++ forbids
declaration with no type" error.
* g++.dg/parse/template30.C: Don't expect "parse error in
template argument list" error.
* g++.dg/cpp1y/var-templ82.C: New test.

d: Fix wrong code-gen when returning structs by value.

Since r13-1104, structs have have compute_record_mode called too early
on them, causing them to return differently depending on the order that
types are generated in, and whether there are forward references.

This patch moves the call to compute_record_mode into its own function,
and calls it after all fields have been given a size.

PR d/106977
PR target/110406

gcc/d/ChangeLog:

* types.cc (finish_aggregate_mode): New function.
(finish_incomplete_fields): Call finish_aggregate_mode.
(finish_aggregate_type): Replace call to compute_record_mode with
finish_aggregate_mode.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr110406.d: New test.

d: Fix d_signed_or_unsigned_type is invoked for vector types (PR110193)

This function can be invoked on VECTOR_TYPE, but the implementation
assumes it works on integer types only. To fix, added a check whether
the type passed is any `__vector(T)' or non-integral type, and return
early by calling `signed_or_unsigned_type_for()' instead.

Problem was found by instrumenting TYPE_PRECISION and ICEing when
applied on VECTOR_TYPEs.

PR d/110193

gcc/d/ChangeLog:

* types.cc (d_signed_or_unsigned_type): Handle being called with any
vector or non-integral type.

c++: fix error reporting routines re-entered ICE [PR110175]

Here we get the "error reporting routines re-entered" ICE because
of an unguarded use of warning_at. While at it, I added a check
for a warning_at just above it.

PR c++/110175

gcc/cp/ChangeLog:

* typeck.cc (cp_build_unary_op): Check tf_warning before warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-110175.C: New test.

final+varasm: Change return type of predicate functions from int to bool

Also change some internal variables to bool and change return type of
compute_alignments to void.

gcc/ChangeLog:

* output.h (leaf_function_p): Change return type from int to bool.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
(maybe_assemble_visibility): Ditto.
* varasm.h (supports_one_only): Ditto.
* rtl.h (compute_alignments): Change return type from int to void.
* final.cc (app_on): Change return type from int to bool.
(compute_alignments): Change return type from int to void
and adjust function body accordingly.
(shorten_branches): Change "something_changed" variable
type from int to bool.
(leaf_function_p): Change return type from int to bool
and adjust function body accordingly.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
* varasm.cc (contains_pointers_p): Change return type from
int to bool and adjust function body accordingly.
(compare_constant): Ditto.
(maybe_assemble_visibility): Ditto.
(supports_one_only): Ditto.