git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Split up pstl/set.cc testcase

This testcase is causing some timeout issues. This patch splits the
testcase up by individual set algorithm.

libstdc++-v3:/ChangeLog:
* testsuite/25_algorithms/pstl/alg_sorting/set.cc: Delete
file.
* testsuite/25_algorithms/pstl/alg_sorting/set_difference.cc:
New file.
* testsuite/25_algorithms/pstl/alg_sorting/set_intersection.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_union.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
Likewise.

doc: Update my Contributors entry

gcc/ChangeLog:

* doc/contrib.texi (Contributors): Update my entry.

value-prof.cc: Correct edge prob calculation.

The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.

Signed-off-by: Filip Kastl <filip.kastl@gmail.com>
gcc/ChangeLog:

* value-prof.cc (gimple_mod_subtract_transform): Correct edge
prob calculation.

sched: Change return type of predicate functions from int to bool

Also change some internal variables to bool.

gcc/ChangeLog:

* sched-int.h (struct haifa_sched_info): Change can_schedule_ready_p,
scehdule_more_p and contributes_to_priority indirect frunction
type from int to bool.
(no_real_insns_p): Change return type from int to bool.
(contributes_to_priority): Ditto.
* haifa-sched.cc (no_real_insns_p): Change return type from
int to bool and adjust function body accordingly.
* modulo-sched.cc (try_scheduling_node_in_cycle): Change "success"
variable type from int to bool.
(ps_insn_advance_column): Change return type from int to bool.
(ps_has_conflicts): Ditto. Change "has_conflicts"
variable type from int to bool.
* sched-deps.cc (deps_may_trap_p): Change return type from int to bool.
(conditions_mutex_p): Ditto.
* sched-ebb.cc (schedule_more_p): Ditto.
(ebb_contributes_to_priority): Change return type from
int to bool and adjust function body accordingly.
* sched-rgn.cc (is_cfg_nonregular): Ditto.
(check_live_1): Ditto.
(is_pfree): Ditto.
(find_conditional_protection): Ditto.
(is_conditionally_protected): Ditto.
(is_prisky): Ditto.
(is_exception_free): Ditto.
(haifa_find_rgns): Change "unreachable" and "too_large_failure"
variables from int to bool.
(extend_rgns): Change "rescan" variable from int to bool.
(check_live): Change return type from
int to bool and adjust function body accordingly.
(can_schedule_ready_p): Ditto.
(schedule_more_p): Ditto.
(contributes_to_priority): Ditto.

gimple-isel: Recognize vec_extract pattern.

In gimple-isel we already deduce a vec_set pattern from an
ARRAY_REF(VIEW_CONVERT_EXPR).  This patch does the same for a
vec_extract.

The code is largely similar to the vec_set one
including the addition of a can_vec_extract_var_idx_p function
in optabs.cc to check if the backend can handle a register
operand as index.  We already have can_vec_extract in
optabs-query but that one checks whether we can extract
specific modes.

With the introduction of an internal function for vec_extract
the expander must not FAIL.  For vec_set this has already been
the case so adjust the documentation accordingly.

Additionally, clarify the wording of the vector-vector case for
vec_extract.

gcc/ChangeLog:

* doc/md.texi: Document that vec_set and vec_extract must not
fail.
* gimple-isel.cc (gimple_expand_vec_set_expr): Rename this...
(gimple_expand_vec_set_extract_expr): ...to this.
(gimple_expand_vec_exprs): Call renamed function.
* internal-fn.cc (vec_extract_direct): Add.
(expand_vec_extract_optab_fn): New function to expand
vec_extract optab.
(direct_vec_extract_optab_supported_p): Add.
* internal-fn.def (VEC_EXTRACT): Add.
* optabs.cc (can_vec_extract_var_idx_p): New function.
* optabs.h (can_vec_extract_var_idx_p): Declare.

RISC-V: Support variable index in vec_extract.

This patch adds a gen_lowpart in the vec_extract expander so it properly
works with a variable index and adds tests.

gcc/ChangeLog:

* config/riscv/autovec.md: Add gen_lowpart.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Add
tests for variable index.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
Ditto.

RISC-V: Allow variable index for vec_set.

This patch enables a variable index for vec_set and adjust the tests.

gcc/ChangeLog:

* config/riscv/autovec.md: Allow register index operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
Ditto.

RISC-V: Use FRM_DYN when add the rounding mode operand

This patch would like to take FRM_DYN const rtx as the rounding mode
operand according to the RVV spec, which takes the dyn as the only
rounding mode for floating-point.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc
(function_expander::use_exact_insn): Use FRM_DYN instead of const0.
Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Change truncate to float_truncate in narrowing patterns.

This fixes a bug in the autovect FP narrowing patterns which resulted in
a combine ICE. It would try to e.g. simplify a unary operation by
simplify_const_unary_operation which obviously expects a float_truncate
and not a truncate for a floating-point mode.

gcc/ChangeLog:

* config/riscv/autovec.md: Use float_truncate.

VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer

Hi, Richard and Richi.

Address comments from Richi.

Make gs_info.ifn = LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE.

I have fully tested these 4 format:

length = vf is a dummpy length,
mask = {-1,-1, ... } is a dummy mask.

1. no length, no mask
   LEN_MASK_GATHER_LOAD (..., length = vf, mask = {-1,-1,...})
2. exist length, no mask
   LEN_MASK_GATHER_LOAD (..., len, mask = {-1,-1,...})
3. exist mask, no length
   LEN_MASK_GATHER_LOAD (..., length = vf, mask)
4. both mask and length exist
   LEN_MASK_GATHER_LOAD (..., length, mask)

All of these work fine in this patch.

Here is the example:

void
f (int *restrict a,
   int *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i * 4] = b[i];
    }
}

Gimple IR:

  <bb 3> [local count: 105119324]:
  _58 = (unsigned long) n_13(D);

  <bb 4> [local count: 630715945]:
  # vectp_cond.7_45 = PHI <vectp_cond.7_46(4), cond_14(D)(3)>
  # vectp_b.11_51 = PHI <vectp_b.11_52(4), b_15(D)(3)>
  # vectp_a.14_55 = PHI <vectp_a.14_56(4), a_16(D)(3)>
  # ivtmp_59 = PHI <ivtmp_60(4), _58(3)>
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [2, 2]);
  ivtmp_44 = _61 * 4;
  vect__4.9_47 = .LEN_MASK_LOAD (vectp_cond.7_45, 32B, _61, 0, { -1, ... });
  mask__24.10_49 = vect__4.9_47 != { 0, ... };
  vect__8.13_53 = .LEN_MASK_LOAD (vectp_b.11_51, 32B, _61, 0, mask__24.10_49);
  ivtmp_54 = _61 * 16;
  .LEN_MASK_SCATTER_STORE (vectp_a.14_55, { 0, 16, 32, ... }, 1, vect__8.13_53, _61, 0, mask__24.10_49);
  vectp_cond.7_46 = vectp_cond.7_45 + ivtmp_44;
  vectp_b.11_52 = vectp_b.11_51 + ivtmp_44;
  vectp_a.14_56 = vectp_a.14_55 + ivtmp_54;
  ivtmp_60 = ivtmp_59 - _61;
  if (ivtmp_60 != 0)
    goto <bb 4>; [83.33%]
  else
    goto <bb 5>; [16.67%]

Ok for trunk ?

gcc/ChangeLog:

* internal-fn.cc (internal_fn_len_index): Apply
LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer.
(internal_fn_mask_index): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Ditto.
(supports_vec_scatter_store_p): Ditto.
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.

Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL.

RISC-V lowers the TYPE_PRECISION for MODE_VECTOR_BOOL vectors in order
to distinguish between VNx1BI, VNx2BI, VNx4BI and VNx8BI.

This patch adjusts uses of MODE_VECTOR_BOOL to use GET_MODE_PRECISION
instead of GET_MODE_BITSIZE.

The RISC-V tests are provided by Juzhe.

Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/c-family/ChangeLog:

* c-common.cc (c_common_type_for_mode): Use GET_MODE_PRECISION.

gcc/ChangeLog:

* simplify-rtx.cc (native_encode_rtx): Ditto.
(native_decode_vector_rtx): Ditto.
(simplify_const_vector_byte_offset): Ditto.
(simplify_const_vector_subreg): Ditto.
* tree.cc (build_truth_vector_type_for_mode): Ditto.
* varasm.cc (output_constant_pool_2): Ditto.

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_type_for_mode): Ditto.

gcc/go/ChangeLog:

* go-lang.cc (go_langhook_type_for_mode): Ditto.

gcc/lto/ChangeLog:

* lto-lang.cc (lto_type_for_mode): Ditto.

gcc/rust/ChangeLog:

* backend/rust-tree.cc (c_common_type_for_mode): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.

MIPS: Use unaligned access to expand block_move on r6

MIPSr6 support unaligned memory access with normal lh/sh/lw/sw/ld/sd
instructions, and thus lwl/lwr/ldl/ldr and swl/swr/sdl/sdr is removed.

For microarchitecture, these memory access instructions issue 2
operation if the address is not aligned, which is like what lwl family
do.

For some situation (such as accessing boundary of pages) on some
microarchitectures, the unaligned access may not be good enough,
then the kernel should trap&emu it: the kernel may need
-mno-unalgined-access option.

gcc/
* config/mips/mips.cc (mips_expand_block_move): don't expand for
r6 with -mno-unaligned-access option if one or both of src and
dest are unaligned. restruct: return directly if length is not const.
(mips_block_move_straight): emit_move if ISA_HAS_UNALIGNED_ACCESS.

gcc/testsuite/
* gcc.target/mips/expand-block-move-r6-no-unaligned.c: new test.
* gcc.target/mips/expand-block-move-r6.c: new test.

adjust testcase for now happening epilogue vectorization

gcc.dg/vect/slp-perm-9.c is reported to FAIL with -march=cascadelake
now which is because we now vectorize the epilogue with V2HImode
vectors after the recent change to not scrap too large vector
epilogues during transform but during analysis time.

The following adjusts the testcase to always use the existing alternate
N which avoids epilogue vectorization.

* gcc.dg/vect/slp-perm-9.c: Always use alternate N.

x86: suppress avx512f-copysign.c testcase for 32-bit

The test installed by "x86: make VPTERNLOG* usable on less than 512-bit
operands with just AVX512F" won't succeed on 32-bit, for floating point
operations being done there (by default) without using SIMD insns.

gcc/testsuite/

* gcc.target/i386/avx512f-copysign.c: Suppress for 32-bit.

x86: yet more PR target/100711-like splitting

Following two-operand bitwise operations, add another splitter to also
deal with not followed by broadcast all on its own, which can be
expressed as simple embedded broadcast instead once a broadcast operand
is actually permitted in the respective insn. While there also permit
a broadcast operand in the corresponding expander.

gcc/

PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate as a singular vpternlog.
(one_cmpl<mode>2): Allow broadcast for operand 1.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Likewise.

gcc/testsuite/

PR target/100711
* gcc.target/i386/pr100711-6.c: New test.

x86: further PR target/100711-like splitting

With respective two-operand bitwise operations now expressable by a
single VPTERNLOG, add splitters to also deal with ior and xor
counterparts of the original and-only case. Note that the splitters need
to be separate, as the placement of "not" differs in the final insns
(*iornot<mode>3, *xnor<mode>3) which are intended to pick up one half of
the result.

gcc/

PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.

gcc/testsuite/

PR target/100711
* gcc.target/i386/pr100711-4.c: New test.
* gcc.target/i386/pr100711-5.c: New test.

x86: allow memory operand for AVX2 splitter for PR target/100711

The intended broadcast (with AVX512) can very well be done right from
memory.

gcc/

PR target/100711
* config/i386/sse.md: Permit non-immediate operand 1 in AVX2
form of splitter for PR target/100711.

middle-end/110541 - VEC_PERM_EXPR documentation is off

The following adjusts the tree.def documentation about VEC_PERM_EXPR
which wasn't adjusted when the restrictions of permutes with constant
mask were relaxed.

PR middle-end/110541
* tree.def (VEC_PERM_EXPR): Adjust documentation to reflect
reality.

x86: use VPTERNLOG also for certain andnot forms

When it's the memory operand which is to be inverted, using VPANDN*
requires a further load instruction. The same can be achieved by a
single VPTERNLOG*. Add two new alternatives (for plain memory and
embedded broadcast), adjusting the predicate for the first operand
accordingly.

Two pre-existing testcases actually end up being affected (improved) by
the change, which is reflected in updated expectations there.

gcc/

PR target/93768
* config/i386/sse.md (*andnot<mode>3): Add new alternatives
for memory form operand 1.

gcc/testsuite/

PR target/93768
* gcc.target/i386/avx512f-andn-di-zmm-2.c: New test.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations
towards generated code.
* gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit
code.

x86: use VPTERNLOG for further bitwise two-vector operations

All combinations of and, ior, xor, and not involving two operands can be
expressed that way in a single insn.

gcc/

PR target/93768
* config/i386/i386.cc (ix86_rtx_costs): Further special-case
bitwise vector operations.
* config/i386/sse.md (*iornot<mode>3): New insn.
(*xnor<mode>3): Likewise.
(*<nlogic><mode>3): Likewise.
(andor): New code iterator.
(nlogic): New code attribute.
(ternlog_nlogic): Likewise.

gcc/testsuite/

PR target/93768
* gcc.target/i386/avx512-binop-not-1.h: New.
* gcc.target/i386/avx512-binop-not-2.h: New.
* gcc.target/i386/avx512f-orn-si-zmm-1.c: New test.
* gcc.target/i386/avx512f-orn-si-zmm-2.c: New test.

Fix typo in vectorizer debug message

* tree-vect-stmts.cc (vect_mark_relevant): Fix typo.

libstdc++: Disable std::forward_list tests for C++98 mode

These tests fail with -std=gnu++98/-D_GLIBCXX_DEBUG in the runtest
flags. They should require the c++11 effective target.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/forward_list/debug/iterator1_neg.cc:
Skip as UNSUPPORTED for C++98 mode.
* testsuite/23_containers/forward_list/debug/iterator3_neg.cc:
Likewise.

libstdc++: Fix std::__uninitialized_default_n for constant evaluation [PR110542]

libstdc++-v3/ChangeLog:

PR libstdc++/110542
* include/bits/stl_uninitialized.h (__uninitialized_default_n):
Do not use std::fill_n during constant evaluation.

libstdc++: Use RAII in std::vector::_M_default_append

Similar to r14-2052-gdd2eb972a5b063, replace the try-block with RAII
types for deallocating storage and destroying elements.

libstdc++-v3/ChangeLog:

* include/bits/vector.tcc (_M_default_append): Replace try-block
with RAII types.

libstdc++: Add redundant 'typename' to std::projected

This is needed by Clang 15.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (projected): Add typename.

RISC-V:Add float16 tuple type abi

gcc/ChangeLog:

* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.

RISC-V:Add float16 tuple type support

This patch adds support for the float16 tuple type.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t):
New types.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
* config/riscv/riscv.md: New.
* config/riscv/vector-iterators.md: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.

MIPS: Adjust mips16e2 related tests for ifcvt costing changes

A mips16e2 related test fails after the ifcvt change. The mips16e2
addition also causes a test for unrelated module to fail.

This patch adjusts branch costs when running the two affected tests.

These tests should not require the -mbranch-cost option, and
this issue needs to be addressed.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Adjust branch cost to
encourage if-conversion.
* gcc.target/mips/movcc-3.c: Same as above.

Daily bump.

PR 110487: `(a !=/== CST1 ? CST2 : CST3)` pattern for type safety

The problem here is we might produce some values out of the type's
min/max (and/or valid values, e.g. signed booleans). The fix is to
use an integer type which has the same precision and signedness
as the original type.

Note two_value_replacement in phiopt had the same issue in previous
versions; though I don't know if a problem will show up there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/110487
* match.pd (a !=/== CST1 ? CST2 : CST3): Always
build a nonstandard integer and use that.

Fix PR 110487: invalid signed boolean value

This fixes the first part of this bug where `a ? -1 : 0`
would cause a value of 1 into the signed boolean value.
It fixes the problem by casting to an integer type of
the same size/signedness before doing the negative and
then casting to the type of expression.

OK? Bootstrapped and tested on x86_64.

gcc/ChangeLog:

* match.pd (a?-1:0): Cast type an integer type
rather the type before the negative.
(a?0:-1): Likewise.

xtensa: Use HARD_REG_SET instead of bare integer

gcc/ChangeLog:

* config/xtensa/xtensa.cc (machine_function, xtensa_expand_prologue):
Change to use HARD_REG_BIT and its macros.
* config/xtensa/xtensa.md
(peephole2: regmove elimination during DFmode input reload):
Likewise.

tree-optimization/110491 - PHI-OPT and undefs

The following makes sure to not make conditional undefs in PHI arguments
unconditional by folding cond ? arg1 : arg2.

PR tree-optimization/110491
* tree-ssa-phiopt.cc (match_simplify_replacement): Check
whether the PHI args are possibly undefined before folding
the COND_EXPR.

* gcc.dg/torture/pr110491.c: New testcase.

Streamer: Fix out of range memory access of machine mode

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.

Care is taken that for offload compilation, we interpret the stream-in
data in terms of the host 'MAX_MACHINE_MODE' ('file_data->mode_bits'),
which very likely is different from the offload device
'MAX_MACHINE_MODE'.

gcc/
* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
bits for machine mode table.
* lto-streamer-out.cc (lto_write_mode_table): Stream out the
HOST machine mode bits.
* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
as the table size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
as the packing limit.
(bp_unpack_machine_mode): Ditto with 'file_data->mode_bits'.
gcc/lto/
* lto-common.cc (lto_file_finalize) [!ACCEL_COMPILER]: Initialize
'file_data->mode_bits'.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>

LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'

... instead of just 'unsigned char *mode_table'. Preparation for a forthcoming
change, where we need to capture an additional 'file_data' item, so it seems
easier to just capture that one proper.

gcc/
* lto-streamer.h (class lto_input_block): Capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table'.
* ipa-devirt.cc (ipa_odr_read_section): Adjust.
* ipa-fnsummary.cc (inline_read_section): Likewise.
* ipa-icf.cc (sem_item_optimizer::read_section): Likewise.
* ipa-modref.cc (read_section): Likewise.
* ipa-prop.cc (ipa_prop_read_section, read_replacements_section):
Likewise.
* ipa-sra.cc (isra_read_summary_section): Likewise.
* lto-cgraph.cc (input_cgraph_opt_section): Likewise.
* lto-section-in.cc (lto_create_simple_input_block): Likewise.
* lto-streamer-in.cc (lto_read_body_or_constructor)
(lto_input_toplevel_asms): Likewise.
* tree-streamer.h (bp_unpack_machine_mode): Likewise.
gcc/lto/
* lto-common.cc (lto_read_decls): Adjust.

Use mark_ssa_maybe_undefs in PHI-OPT

The following removes gimple_uses_undefined_value_p and instead
uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last
user of the other API.

* tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs.
(empty_bb_or_one_feeding_into_p): Check for them.
* tree-ssa.h (gimple_uses_undefined_value_p): Remove.
* tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.

Remove unnecessary check on scalar_niter == 0

The following removes an unnecessary check.

* tree-vect-loop.cc (vect_analyze_loop_costing): Remove
check guarding scalar_niter underflow.

tree-optimization/110376 - testcase for fixed bug

This is a new testcase for the fixed bug.

PR tree-optimization/110376
* gcc.dg/torture/pr110376.c: New testcase.

PR tree-optimization/110531 - Vect: avoid using uninitialized variable

slp_done_for_suggested_uf is used directly in vect_analyze_loop_2
without initialization, which is undefined behavior. Initialize it to false
according to the discussion.

gcc/ChangeLog:
PR tree-optimization/110531
* tree-vect-loop.cc (vect_analyze_loop_1): initialize
slp_done_for_suggested_uf to false.

tree-optimization/110228 - avoid undefs in ifcombine more thoroughly

The following replaces the simplistic gimple_uses_undefined_value_p
with the conservative mark_ssa_maybe_undefs approach as already
used by LIM and IVOPTs. This is to avoid exposing an unconditional
uninitialized read on a path from entry by if-combine.

PR tree-optimization/110228
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute):
Mark SSA may-undefs.
(bb_no_side_effects_p): Check stmt uses for undefs.

* gcc.dg/torture/pr110228.c: New testcase.
* gcc.dg/uninit-pr101912.c: Un-XFAIL.

tree-optimization/110436 - bogus live/relevant for unused pattern

When we compute liveness and relevantness we have to make sure to
handle live but not relevant stmts in a way we can later vectorize
them. When the stmt uses only operands that do not need vectorization
we can just leave such stmts in place - but not in the case they
are recognized as patterns. Since we don't have a way to cancel
pattern recognition we have to force mark such stmts as relevant.

PR tree-optimization/110436
* tree-vect-stmts.cc (vect_mark_relevant): Expand dumping,
force live but not relevant pattern stmts relevant.

* gcc.dg/pr110436.c: New testcase.

x86: Enable ENQCMD and UINTR for march=sierraforest.

Enable ENQCMD and UINTR for march=sierraforest according to Intel ISE
https://cdrdv2.intel.com/v1/dl/getContent/671368

gcc/ChangeLog

* config/i386/i386.h: Add PTA_ENQCMD and PTA_UINTR to PTA_SIERRAFOREST.
* doc/invoke.texi: Update new isa to march=sierraforest and grandridge.

ada: Do not unnecessarily use component-wise loop for slice assignment

This relaxes the condition under which Expand_Assign_Array leaves the
assignment to or from an array slice untouched. The main prerequisite
for the code generator is that everything be aligned on byte boundaries
and Is_Possibly_Unaligned_Slice is too strong a predicate for this, so
it is replaced by the combination of Possible_Bit_Aligned_Component and
Is_Bit_Packed_Array, modulo a change to Possible_Bit_Aligned_Component
to take into account the specific case of slices.

gcc/ada/

* exp_ch5.adb (Expand_Assign_Array): Adjust comment above the
calls to Possible_Bit_Aligned_Component on the LHS and RHS. Do not
call Is_Possibly_Unaligned_Slice in the slice case.
* exp_util.ads (Component_May_Be_Bit_Aligned): Add For_Slice
boolean parameter.
(Possible_Bit_Aligned_Component): Likewise.
* exp_util.adb (Component_May_Be_Bit_Aligned): Do not return False
for the slice of a small record or bit-packed array component.
(Possible_Bit_Aligned_Component): Pass For_Slice in recursive
calls, except in the slice case where True is passed, as well as
in call to Component_May_Be_Bit_Aligned.

ada: Small adjustments to new procedure Expand_Unchecked_Union_Equality

The procedure is not stable under repeated invocation. Now it may be called
twice on the same node, for example during the expansion of the renaming of
the predefined equality operator after the unchecked union type is frozen.

gcc/ada/

* exp_ch4.ads (Expand_Unchecked_Union_Equality): Only take a
single parameter.
* exp_ch4.adb (Expand_Unchecked_Union_Equality): Add guard against
repeated invocation on the same node.
* exp_ch6.adb (Expand_Call): Only pass a single actual parameter
in the call to Expand_Unchecked_Union_Equality.

ada: Add No_Use_Of_Attribute & No_Use_Of_Pragma to gnat_rm

gcc/ada/

* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst:
add No_Use_Of_Attribute & No_Use_Of_Pragma restrictions.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Fix list of inherited subprograms in query for GNATprove

The query Inherited_Subprograms was returning a list containing
some subprograms whose overridding was also in the list, when
interfaces was present. This was an issue for GNATprove. Now propose
a mode for this function to filter out overridden primitives.

gcc/ada/

* sem_disp.adb (Inherited_Subprograms): Add parameter to filter
out results.
* sem_disp.ads: Likewise.

middle-end/110495 - avoid associating constants with (VL) vectors

When trying to associate (v + INT_MAX) + INT_MAX we are using
the TREE_OVERFLOW bit to check for correctness. That isn't
working for VECTOR_CSTs and it can't in general when one considers
VL vectors. It looks like it should work for COMPLEX_CSTs but
I didn't try to single out _Complex int in this change.

The following makes sure that for vectors we use the fallback of
using unsigned arithmetic when associating the above to
v + (INT_MAX + INT_MAX).

PR middle-end/110495
* tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs
since we do not set TREE_OVERFLOW on those since the
introduction of VL vectors.
* match.pd (x +- CST +- CST): For VECTOR_CST do not look
at TREE_OVERFLOW to determine validity of association.

* gcc.dg/tree-ssa/addadd-2.c: Amend.
* gcc.dg/tree-ssa/forwprop-27.c: Adjust.

tree-optimization/110310 - move vector epilogue disabling to analysis phase

The following removes late deciding to elide vectorized epilogues to
the analysis phase and also avoids altering the epilogues niter.
The costing part from vect_determine_partial_vectors_and_peeling is
moved to vect_analyze_loop_costing where we use the main loop
analysis to constrain the epilogue scalar iterations.

I have not tried to integrate this with vect_known_niters_smaller_than_vf.

It seems the for_epilogue_p parameter in
vect_determine_partial_vectors_and_peeling is largely useless and
we could compute that in the function itself.

PR tree-optimization/110310
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Move costing part ...
(vect_analyze_loop_costing): ... here. Integrate better
estimate for epilogues from ...
(vect_analyze_loop_2): Call vect_determine_partial_vectors_and_peeling
with actual epilogue status.
* tree-vect-loop-manip.cc (vect_do_peeling): ... here and
avoid cancelling epilogue vectorization.
(vect_update_epilogue_niters): Remove. No longer update
epilogue LOOP_VINFO_NITERS.

* gcc.target/i386/pr110310.c: New testcase.
* gcc.dg/vect/slp-perm-12.c: Disable epilogue vectorization.

Revert "RISC-V: Fix one typo of FRM dynamic definition"

This reverts commit 3d95a524d4746ceb3065f92f30a5679afb88d16a.

gcc/ChangeLog:

* config/riscv/vector.md: Revert changes.

Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

Hi, Richi and Richard.

Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html

I change len_mask_gather_load/len_mask_scatter_store order into:
{len,bias,mask}

We adjust adding len and mask using using add_len_and_mask_args
which is same as partial_load/parial_store.

Now, the codes become more reasonable and easier maintain.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i * step + base] = b[i * step + base];
    }
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Ok for trunk?

gcc/ChangeLog:

* doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

RISC-V: Optimize local AVL propagation

I recently noticed that current VSETVL pass has a unnecessary restriction on local
AVL propgation.

Consider this following case:

+                      insn 1: vsetvli a5,a3,e8,mf4,ta,mu
+                      insn 2: vsetvli zero,a5,e32,m1,ta,ma
+                      ...
+                      vle32.v v1,0(a1)
+                      vsetvli a2,zero,e32,m1,ta,ma
+                      vadd.vv v1,v1,v1
+                      vsetvli zero,a5,e32,m1,ta,ma
+                      vse32.v v1,0(a0)
+                      ...
+                      insn 3: sub     a3,a3,a5
+                      ...

We failed to elide insn 2 (vsetvl insn) since insn 3 is modifying "a3" AVL.
Actually, we don't really care about insn 3 since we should only check and make sure
there is no insn between insn 1 and insn 2 that modifies "a3" AVL. Then, we can propgate
AVL "a3" from insn 1 to insn 2. Finally, insn 2 is eliminated.

After this patch:

+                      insn 1: vsetvli a5,a3,e8,mf4,ta,ma
+                      ...
+                      vle32.v v1,0(a1)
+                      vsetvli a2,zero,e32,m1,ta,ma
+                      vadd.vv v1,v1,v1
+                      vsetvli zero,a5,e32,m1,ta,ma
+                      vse32.v v1,0(a0)
+                      ...
+                      insn 3: sub     a3,a3,a5
+                      ...

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_insn_info::parse_insn): Add early break.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_prop-1.c: New test.

CRIS: Replace unspec CRIS_UNSPEC_SWAP_BITS with rtx bitreverse

This is just expected to be a change in representation.
No code is expected to change; no new tests are added.

* config/cris/cris.md (CRIS_UNSPEC_SWAP_BITS): Remove.
("cris_swap_bits", "ctzsi2"): Use bitreverse instead.

dwarf2out.cc (mem_loc_descriptor): Handle BITREVERSE

This seems to have just been overlooked when introducing
BITREVERSE. Note that the function name mem_loc_descriptor
is a misnomer; it'd better be called rtx_loc_descriptor or
any_loc_descriptor, because "anything" RTX can end up here.
To wit, when introducing new RTL that ends up as code or for
other reasons appear in debug expressions, don't forget to
update this function. This was observed by building
libstdc+++ for cris-elf with a patch replacing the
CRIS_UNSPEC_SWAP_BITS by bitreverse, as hitting the
internal-error-generating default case.

Looking at the BSWAP, POPCOUNT and ROTATE cases, BITREVERSE
can probably be fully expressed as DWARF code if need be,
but let's start with not throwing an internal error.

gcc:
* dwarf2out.cc (mem_loc_descriptor): Handle BITREVERSE.

Daily bump.

libstdc++: Fix <iosfwd> synopsis test

The <syncstream> header is only supported for the cxx11 ABI. The
declarations of basic_syncbuf, basic_osyncstream, syncbuf and
osyncstream were already correctly guarded by a check for
_GLIBCXX_USE_CXX11_ABI, but the wsyncbuf and wosyncstream declarations
were not.

libstdc++-v3/ChangeLog:

* testsuite/27_io/headers/iosfwd/synopsis.cc: Make wsyncbuf and
wosyncstream depend on _GLIBCXX_USE_CXX11_ABI.

libstdc++: Enable OpenMP 5.0 pragmas in PSTL headers

This reapplies r10-1314-g32bab8b6ad0a90 which was lost in the recent
PSTL rebase from upstream.

* include/pstl/pstl_config.h (_PSTL_PRAGMA_SIMD_SCAN,
_PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN, _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN):
Define to OpenMP 5.0 pragmas even for GCC 10.0+.
(_PSTL_UDS_PRESENT): Define to 1 for GCC 10.0+.

libstdc++: Qualify calls to std::_Destroy and _Destroy_aux

These calls should be qualified to prevent ADL, which can cause errors
for incomplete types that are associated classes.

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (_Destroy): Qualify call.
* include/bits/stl_construct.h (_Destroy, _Destroy_n): Likewise.
* testsuite/23_containers/vector/cons/destroy-adl.cc: New test.

RISC-V: Add support for vector crypto extensions

This series adds basic support for the vector crypto extensions:
  * Zvbb
  * Zvbc
  * Zvkg
  * Zvkned
  * Zvkhn[a,b]
  * Zvksed
  * Zvksh
  * Zvkn
  * Zvknc
  * Zvkng
  * Zvks
  * Zvksc
  * Zvksg
  * Zvkt

This patch is based on the v20230620 version of the Vector Cryptography
specification. The specification is frozen and can be found here:
  https://github.com/riscv/riscv-crypto/releases/tag/v20230620

Binutils support is merged as 9fdc1b157b6e72f7dd98851a240c5fdb386a558e.
All extensions come with (passing) tests for the feature test macros.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add support for zvbb,
zvbc, zvkg, zvkned, zvknha, zvknhb, zvksed, zvksh, zvkn,
zvknc, zvkng, zvks, zvksc, zvksg, zvkt and the implied subsets.
* config/riscv/arch-canonicalize: Add canonicalization info for
zvkn, zvknc, zvkng, zvks, zvksc, zvksg.
* config/riscv/riscv-opts.h (MASK_ZVBB): New macro.
(MASK_ZVBC): Likewise.
(TARGET_ZVBB): Likewise.
(TARGET_ZVBC): Likewise.
(MASK_ZVKG): Likewise.
(MASK_ZVKNED): Likewise.
(MASK_ZVKNHA): Likewise.
(MASK_ZVKNHB): Likewise.
(MASK_ZVKSED): Likewise.
(MASK_ZVKSH): Likewise.
(MASK_ZVKN): Likewise.
(MASK_ZVKNC): Likewise.
(MASK_ZVKNG): Likewise.
(MASK_ZVKS): Likewise.
(MASK_ZVKSC): Likewise.
(MASK_ZVKSG): Likewise.
(MASK_ZVKT): Likewise.
(TARGET_ZVKG): Likewise.
(TARGET_ZVKNED): Likewise.
(TARGET_ZVKNHA): Likewise.
(TARGET_ZVKNHB): Likewise.
(TARGET_ZVKSED): Likewise.
(TARGET_ZVKSH): Likewise.
(TARGET_ZVKN): Likewise.
(TARGET_ZVKNC): Likewise.
(TARGET_ZVKNG): Likewise.
(TARGET_ZVKS): Likewise.
(TARGET_ZVKSC): Likewise.
(TARGET_ZVKSG): Likewise.
(TARGET_ZVKT): Likewise.
* config/riscv/riscv.opt: Introduction of riscv_zv{b,k}_subext.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvbb.c: New test.
* gcc.target/riscv/zvbc.c: New test.
* gcc.target/riscv/zvkg.c: New test.
* gcc.target/riscv/zvkn-1.c: New test.
* gcc.target/riscv/zvkn.c: New test.
* gcc.target/riscv/zvknc-1.c: New test.
* gcc.target/riscv/zvknc-2.c: New test.
* gcc.target/riscv/zvknc.c: New test.
* gcc.target/riscv/zvkned.c: New test.
* gcc.target/riscv/zvkng-1.c: New test.
* gcc.target/riscv/zvkng-2.c: New test.
* gcc.target/riscv/zvkng.c: New test.
* gcc.target/riscv/zvknha.c: New test.
* gcc.target/riscv/zvknhb.c: New test.
* gcc.target/riscv/zvks-1.c: New test.
* gcc.target/riscv/zvks.c: New test.
* gcc.target/riscv/zvksc-1.c: New test.
* gcc.target/riscv/zvksc-2.c: New test.
* gcc.target/riscv/zvksc.c: New test.
* gcc.target/riscv/zvksed.c: New test.
* gcc.target/riscv/zvksg-1.c: New test.
* gcc.target/riscv/zvksg-2.c: New test.
* gcc.target/riscv/zvksg.c: New test.
* gcc.target/riscv/zvksh.c: New test.
* gcc.target/riscv/zvkt.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Use chain_next on eh_landing_pad_d for GTY (PR middle-end/110510)

The backtrace in the bug report suggest there is a running out of
stack during GC collection, because of a long chain of eh_landing_pad_d.
This might fix that by adding chain_next onto eh_landing_pad_d's GTY marker.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/110510
* except.h (struct eh_landing_pad_d): Add chain_next GTY.

testsuite, Darwin: Remove an unnecessary flags addition.

The addition of the multiply_defined suppress flag has been handled for some
considerable time now in the Darwin specs; remove it from the testsuite libs.
Avoid duplicates in the specs.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config/darwin.h: Avoid duplicate multiply_defined specs on
earlier Darwin versions with shared libgcc.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp: Remove additional flag handled
by Darwin specs.

gcc/testsuite/ChangeLog:

* lib/g++.exp: Remove additional flag handled by Darwin specs.
* lib/obj-c++.exp: Likewise.

tree+ggc: Change return type of predicate functions from int to bool

Also change internal variable from int to bool.

gcc/ChangeLog:

* tree.h (tree_int_cst_equal): Change return type from int to bool.
(operand_equal_for_phi_arg_p): Ditto.
(tree_map_base_marked_p): Ditto.
* tree.cc (contains_placeholder_p): Update function body
for bool return type.
(type_cache_hasher::equal): Ditto.
(tree_map_base_hash): Change return type
from int to void and adjust function body accordingly.
(tree_int_cst_equal): Ditto.
(operand_equal_for_phi_arg_p): Ditto.
(get_narrower): Change "first" variable to bool.
(cl_option_hasher::equal): Update function body for bool return type.
* ggc.h (ggc_set_mark): Change return type from int to bool.
(ggc_marked_p): Ditto.
* ggc-page.cc (gt_ggc_mx): Change return type
from int to void and adjust function body accordingly.
(ggc_set_mark): Ditto.

Middle-end: Change order of LEN_MASK_LOAD/LEN_MASK_STORE arguments

Hi, Richard. I fix the order as you suggeted.

Before this patch, the order is {len,mask,bias}.

Now, after this patch, the order becomes {len,bias,mask}.

Since you said we should not need 'internal_fn_bias_index', the bias index should always be the len index + 1.
I notice LEN_STORE order is {len,vector,bias}, to make them consistent, I reorder into LEN_STORE {len,bias,vector}.
Just like MASK_STORE {mask,vector}.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* config/riscv/riscv-v.cc (expand_load_store): Ditto.
* doc/md.texi: Ditto.
* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(add_len_and_mask_args): New function.
(expand_partial_load_optab_fn): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(expand_partial_store_optab_fn): Ditto.
(internal_fn_len_index): New function.
(internal_fn_mask_index): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* internal-fn.h (internal_fn_len_index): New function.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.

ada: Fix renaming of predefined equality operator for unchecked union types

The problem is that the predefined equality operator for unchecked union
types is implemented out of line by invoking a function that takes more
parameters than the two operands, which means that the renaming is not
seen as type conforming with this function and, therefore, is rejected.

The way out is to implement these additional parameters as "extra" formal
parameters, since this kind of parameters is not taken into account for
semantic checks. The change also factors out the duplicated generation
of actuals for these additional parameters into a single procedure.

gcc/ada/

* exp_ch3.ads (Build_Variant_Record_Equality): Add Spec_Id as second
parameter.
* exp_ch3.adb (Build_Variant_Record_Equality): For unchecked union
types, build the additional parameters as extra formal parameters.
(Expand_Freeze_Record_Type.Build_Variant_Record_Equality): Pass
Empty as Spec_Id in call to Build_Variant_Record_Equality.
* exp_ch4.ads (Expand_Unchecked_Union_Equality): New procedure.
* exp_ch4.adb (Expand_Composite_Equality): In the presence of a
function implementing composite equality, do not special case the
unchecked union types, and only convert the operands if the base
types are not the same like in Build_Equality_Call.
(Build_Equality_Call): Do not special case the unchecked union types
and relocate the operands only once.
(Expand_N_Op_Eq): Do not special case the unchecked union types.
(Expand_Unchecked_Union_Equality): New procedure implementing the
specific expansion of calls to the predefined equality function.
* exp_ch6.adb (Is_Unchecked_Union_Equality): New predicate.
(Expand_Call): Call Is_Unchecked_Union_Equality to determine whether
to call Expand_Unchecked_Union_Equality or Expand_Call_Helper.
* exp_ch8.adb (Build_Body_For_Renaming): Set Has_Delayed_Freeze flag
earlier on Id and pass Id in call to Build_Variant_Record_Equality.

ada: Fix discrepancy in expansion of untagged record equality

The expansion of the predefined equality operator for untagged record types
can be done either in line, i.e. into the component-wise comparison of the
operands, or out of line, i.e. into a call to a function implementing this
comparison, and the heuristics of the selection are essentially based on the
complexity of the implementation.

For discriminated record types with a variant part, which comprise unchecked
union types, the expansion is always done out of line. For nondiscriminated
types, the expansion is done in line, unless one of the components is of a
record type for which a user-defined equality operator exists, in which case
the expansion is done out of line.

For the third case, i.e. discriminated record types without a variant part,
the expansion is always done in line. Now given that the discriminants are
considered as mere components for the purpose of predefined equality in this
case, there does not seem to be any reason for treating it differently from
the second case above.

gcc/ada/

* exp_ch3.adb (Build_Untagged_Equality): Rename into...
(Build_Untagged_Record_Equality): ...this.
(Expand_Freeze_Record_Type): Adjust to above renaming and invoke
the procedure also for discriminated types without a variant part.

ada: Fix small inaccuracy in implementation of B.3.3(20/2)

This is the clause about inferable discriminants in unchecked unions.

gcc/ada/

* sem_util.adb (Has_Inferable_Discriminants): In the case of a
component with a per-object constraint, also return true if the
enclosing object is not of an unchecked union type.
In the default case, remove a useless call to Base_Type.

PR modula2/110125 variables reported as uninitialized when set inside WITH

The modula-2 static analysis incorrectly identifies variables as
uninitialized if they are initialized within a WITH statement.  This bug
fix re-implements the variable static analysis and will detect simple
pointer record fields being accessed before being initialized.
The static analysis is limited to the first basic block in a procedure.
It does not check variant records, arrays or sets.  A new option
-Wuninit-variable-checking will turn on the new semantic checking
(-Wall also enables the new checking).

gcc/ChangeLog:

PR modula2/110125
* doc/gm2.texi (Semantic checking): Include examples using
-Wuninit-variable-checking.

gcc/m2/ChangeLog:

PR modula2/110125
* Make-lang.in (GM2-COMP-BOOT-DEFS): Add M2SymInit.def.
(GM2-COMP-BOOT-MODS): Add M2SymInit.mod.
* gm2-compiler/M2BasicBlock.mod: Formatting changes.
* gm2-compiler/M2Code.mod: Remove import of VariableAnalysis from
M2Quads.  Import VariableAnalysis from M2SymInit.mod.
* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
Add debugging print for a component.
(TypeConstFullyDeclared): Call RememberType for every type.
* gm2-compiler/M2GenGCC.mod (CodeReturnValue): Add parameter to
GetQuadOtok.
(CodeBecomes): Add parameter to GetQuadOtok.
(CodeXIndr): Add parameter to GetQuadOtok.
* gm2-compiler/M2Optimize.mod (ReduceBranch): Reformat and
preserve operand token positions when reducing the branch
quadruples.
(ReduceGoto): Reformat.
(FoldMultipleGoto): Reformat.
(KnownReachable): Reformat.
* gm2-compiler/M2Options.def (UninitVariableChecking): New
variable declared and exported.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Options.mod (SetWall): Set
UninitVariableChecking.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Quads.def (PutQuadOtok): Exported and declared.
(VariableAnalysis): Removed.
* gm2-compiler/M2Quads.mod (PutQuadOtok): New procedure.
(doVal): Reformatted.
(MarkAsWrite): Reformatted.
(MarkArrayAsWritten): Reformatted.
(doIndrX): Use PutQuadOtok.
(MakeRightValue): Use GenQuadOtok.
(MakeLeftValue): Use GenQuadOtok.
(CheckReadBeforeInitialized): Remove.
(IsNeverAltered): Reformat.
(DebugLocation): New procedure.
(BuildDesignatorPointer): Use GenQuadO to preserve operand token
position.
(BuildRelOp): Use GenQuadOtok ditto.
* gm2-compiler/SymbolTable.def (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
* gm2-compiler/SymbolTable.mod (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
(LRInitDesc): New type.
(SymVar): InitState new field.
(MakeVar): Initialize InitState.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New function declaration.
* gm2-lang.cc (gm2_langhook_handle_option): Detect
OPT_Wuninit_variable_checking and call SetUninitVariableChecking.
* lang.opt: Add Wuninit-variable-checking.
* gm2-compiler/M2SymInit.def: New file.
* gm2-compiler/M2SymInit.mod: New file.

gcc/testsuite/ChangeLog:

PR modula2/110125
* gm2/switches/uninit-variable-checking/fail/testinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallvec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithnoptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr3.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

RISC-V: Support vfwmul.vv combine lowering

Consider the following complicate case:
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).

gcc/ChangeLog:

* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@"
into "*" in pattern name which simplifies build files.
(*pred_single_widen_mul<any_extend:su><mode>): Ditto.
(*pred_single_widen_mul<mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.

aarch64: Fix vector-to-vector vec_extract

The documentation says:

-------------------------------------------------------------------------
@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
@item @samp{vec_extract@var{m}@var{n}}
Extract given field from the vector value.  [...]  The
@var{n} mode is the mode of the field or vector of fields that should be
extracted, [...]
If @var{n} is a vector mode, the index is counted in units of that mode.
-------------------------------------------------------------------------

However, Robin pointed out that, in practice, the index is counted
in whole multiples of @var{n}.  These are the semantics that x86
and target-independent code follow.

This patch updates the aarch64 pattern to match, which also removes
the FAIL.  I think Robin has patches that update the documentation
and make more use of the de facto semantics.

I haven't found an existing testcase that shows the difference.
We do now use the pattern for:

union u { int32x4_t x; int32x2_t y[2]; };
int32x2_t f(int32x4_t x) { union u u = { x }; return u.y[1]; }

but we were already generating perfect code for it.  Because of that,
it didn't really seem worth adding a specific dump test.

gcc/
* config/aarch64/aarch64-simd.md (vec_extract<mode><Vhalf>): Expect
the index to be 0 or 1.

Revert "RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering"

This reverts commit 47e6dcb597b2d4abcab13c9dea0cc7d2131b6419.

RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

RISC-V: Fix one typo of FRM dynamic definition

This patch would like to fix one typo that take rdn instead of dyn by
mistake.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector.md: Fix typo.

tree-optimization/110506 - ICE in pattern recog with TYPE_PRECISION

The following re-orders checks to make sure we check TYPE_PRECISION
on an integral type.

PR tree-optimization/110506
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Re-order
TYPE_PRECISION access with INTEGRAL_TYPE_P check.

* gcc.dg/pr110506-2.c: New testcase.

tree-optimization/110506 - bogus non-zero mask in CCP for vector types

get_value_for_expr was blindlessly using TYPE_PRECISION to produce
a mask for vector typed entities which the new tree checking now
catches.

PR tree-optimization/110506
* tree-ssa-ccp.cc (get_value_for_expr): Check for integral
type before relying on TYPE_PRECISION to produce a nonzero mask.

* gcc.dg/pr110506.c: New testcase.

testsuite: Add vect_float_strict to testcase [PR 110381]

As discussed in the PR, the testcase needs
/* { dg-require-effective-target vect_float_strict } */

2023-02-03 Andrew Pinski <apinski@marvell.com>

PR tree-optimization/110381
gcc/testsuite/

* gcc.dg/vect/pr110381.c: Add vect_float_strict.

MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain conditions

This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.

gcc/ChangeLog:
* config/mips/mips.md(*and<mode>3_mips16): Generates
ZEB/ZEH instructions.

MIPS: Add CACHE instruction for mips16e2

This patch adds CACHE instruction from mips16e2
with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.

MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2

The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.

MIPS: Add load/store word left/right instructions for mips16e2

This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(mov_<load>l): Generates instructions.
(mov_<load>r): Same as above.
(mov_<store>l): Adjusted for the conditions above.
(mov_<store>r): Same as above.
(mov_<store>l_mips16e2): Add machine description for `define_insn mov_<store>l_mips16e2`.
(mov_<store>r_mips16e2): Add machine description for `define_insn mov_<store>r_mips16e2`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.

MIPS: Add LUI instruction for mips16e2

This patch adds LUI instruction from mips16e2
with corresponding test.

gcc/ChangeLog:

* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* config/mips/mips.md: Same as above

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.

MIPS: Add bitwise instructions for mips16e2

There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .

This patch adds these instrutions with corresponding tests.

gcc/ChangeLog:

* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_bit_clear_p): Declared new function.
(mips_bit_clear_info): Same as above.
* config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(extended_mips16): Generates EXT and INS instructions.
(*and<mode>3): Generates INS instruction.
(*and<mode>3_mips16): Generates EXT, INS and ANDI instructions.
(ior<mode>3): Add logics for ORI instruction.
(*ior<mode>3_mips16_asmacro): Generates ORI instrucion.
(*ior<mode>3_mips16): Add logics for XORI instruction.
(*xor<mode>3_mips16): Generates XORI instrucion.
(*extzv<mode>): Add logics for EXT instruction.
(*insv<mode>): Add logics for INS instruction.
* config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.

MIPS: Add instruction about global pointer register for mips16e2

The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.

As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .

This patch adds these instructions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low<mode>_mips16`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.

MIPS: Add MOVx instructions support for mips16e2

This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov<GPR:mode>_on_<MOVECC:mode>): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<MOVECC:mode>_mips16e2): Generate MOVx instruction.
(*mov<GPR:mode>_on_<GPR2:mode>_ne): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<GPR2:mode>_ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate for MOVx insts.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.

MIPS: Add basic support for mips16e2

The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).

This patch adds basic support for mips16e2 used by the
following series of patches.

gcc/ChangeLog:

* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* config/mips/predicates.md: Add clause for TARGET_MIPS16E2.
* doc/invoke.texi: Add -m(no-)mips16e2 option..

gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.

Daily bump.

d: Fix testcase failure of gdc.dg/Wbuiltin_declaration_mismatch2.d.

Seen at least on aarch64-*-darwin, the parameters used to instantiate
the shufflevector intrinsic meant the return type was __vector(int[1]),
which resulted in the error:

vector type '__vector(int[1])' is not supported on this platform.

All instantiations have now been fixed so the expected warning/error is
now given by the compiler.

gcc/testsuite/ChangeLog:

* gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests.

tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]

The match_uaddc_usubc matching doesn't require that the second
.{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is
at most one. So, in the weird case where the REALPART_EXPR of it isn't
present, we shouldn't ICE trying to replace that REALPART_EXPR with
REALPART_EXPR of .U{ADD,SUB}C result.

2023-07-02 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/110508
* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with
REALPART_EXPR opf nlhs if re2 is non-NULL.

* gcc.dg/pr110508.c: New test.

xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well as TARGET_CLAMPS

Because both smin and smax requiring TARGET_MINMAX are essential to the
RTL representation.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p):
Simplify.
* config/xtensa/xtensa.md (*xtensa_clamps):
Add TARGET_MINMAX to the condition.

xtensa: Fix missing mode warning in "*eqne_INT_MIN"

gcc/ChangeLog:

* config/xtensa/xtensa.md (*eqne_INT_MIN):
Add missing ":SI" to the match_operator.

Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].

This support the -fconstant-cfstrings option as used by clang (and
expect by some build scripts) as an alias to the target-specific
-mconstant-cfstrings.

The documentation is also updated to reflect that the 'f' option is
only available on Darwin, and to add the 'm' option to the Darwin
section of the invocation text.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/108743

gcc/ChangeLog:

* config/darwin.opt: Add fconstant-cfstrings alias to
mconstant-cfstrings.
* doc/invoke.texi: Amend invocation descriptions to reflect
that the fconstant-cfstrings is a target-option alias and to
add the missing mconstant-cfstrings option description to the
Darwin section.

libphobos: Handle Darwin Arm and AArch64 in fibre context asm.

This code currently fails to build because it contains ELF-
specific directives. This patch excludes those directives when
the platform is Darwin.

We do not expect switching fibres between threads to be safe here
either owing to the possible caching of TLS pointers.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libphobos/ChangeLog:

* libdruntime/config/aarch64/switchcontext.S: Exclude ELF-
specific constructs for Darwin.
* libdruntime/config/arm/switchcontext.S: Likewise.
* libdruntime/core/thread/fiber.d: Disable switching fibres
between threads.

d: Add testcase from PR108962

The issue was fixed in r14-2232.

PR d/108962

gcc/testsuite/ChangeLog:

* gdc.dg/pr108962.d: New test.

d: Fix core.volatile.volatileLoad discarded if result is unused

The first pass of code generation in the D front-end splits up all
compound expressions and discards expressions that have no side effects.
This included calls to the `volatileLoad' intrinsic if its result was
not used, causing such calls to be eliminated from the program.

We already set TREE_THIS_VOLATILE on the expression, however the
tree documentation says if this bit is set in an expression, so is
TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too.
This prevents any early discarding from occuring.

PR d/110516

gcc/d/ChangeLog:

* intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the
expanded expression.
(expand_volatile_store): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr110516a.d: New test.
* gdc.dg/torture/pr110516b.d: New test.

Daily bump.

d: Fix accesses of immutable arrays using constant index still bounds checked

Starts setting TREE_READONLY against specific kinds of VAR_DECLs, so
that the middle-end/optimization passes can more aggressively constant
fold D code that makes use of `immutable' or `const'.

PR d/110514

gcc/d/ChangeLog:

* decl.cc (get_symbol_decl): Set TREE_READONLY on certain kinds of
const and immutable variables.
* expr.cc (ExprVisitor::visit (ArrayLiteralExp *)): Set TREE_READONLY
on immutable dynamic array literals.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110514a.d: New test.
* gdc.dg/pr110514b.d: New test.
* gdc.dg/pr110514c.d: New test.
* gdc.dg/pr110514d.d: New test.

libphobos, testsuite: Disable forkgc2 on Darwin [PR103944]

It hangs the testsuite (requiring manual intervention to kill the
spawned processes) which breaks CI. The reason for the hang id not
clear. This skips the test for now (xfail does not work).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR d/103944

libphobos/ChangeLog:

* testsuite/libphobos.gc/forkgc2.d: Skip for Darwin.

d: Don't generate code that throws exceptions when compiling with `-fno-exceptions'

The version flags for RTMI, RTTI, and exceptions was unconditionally
predefined.  These are now only predefined if the feature flag is
enabled.  It was noticed that there was no `-fexceptions' definition
inside d/lang.opt, so the detection of the exceptions option flag was
only partially working.  Once that was fixed, a few places in the
front-end implementation were found to fall fowl of `nothrow' rules,
these have been fixed upstream and backported here as well.

Reviewed-on: https://github.com/dlang/dmd/pull/15357
     https://github.com/dlang/dmd/pull/15360

PR d/110471

gcc/d/ChangeLog:

* d-builtins.cc (d_init_versions): Predefine D_ModuleInfo,
D_Exceptions, and D_TypeInfo only if feature is enabled.
* lang.opt: Add -fexceptions.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110471a.d: New test.
* gdc.dg/pr110471b.d: New test.
* gdc.dg/pr110471c.d: New test.

Add testcase from PR25623

gcc/testsuite/ChangeLog:

PR tree-optimization/25623
* gfortran.dg/pr25623.f90: New test.

Fix profile update in copy-header

Most common source of profile mismatches is now copyheader pass.  The reason is that
in comon case the duplicated header condition will become constant true and that needs
changes in the loop exit condition probability.

While this can be done by jump threading it is not, since it gives up on loops.
Copy header pass now has logic to prove that first exit will become true, so this
patch adds necessary pumbing to the profile updating.
This is done in gimple_duplicate_sese_region in a way that is specific for this
particular case.  I think general case is kind-of unsolvable and loop-ch is the
only user of the infrastructure.  If we later invent some new users, maybe we
can export the region and region_copy arrays and let user to do the update.

With the patch we now get:

Pass dump id and name            |static mismat|dynamic mismatch
                                 |in count     |in count
107t cunrolli                    |      3    +3|        19237       +19237
127t ch                          |     13   +10|        19237
131t dom                         |     39   +26|        19237
133t isolate-paths               |     47    +8|        19237
134t reassoc                     |     49    +2|        19237
136t forwprop                    |     53    +4|       226943      +207706
159t cddce                       |     61    +8|       242222       +15279
161t ldist                       |     62    +1|       242222
172t ifcvt                       |     66    +4|       415472      +173250
173t vect                        |    143   +77|     10859784    +10444312
176t cunroll                     |    294  +151|    150357763   +139497979
183t loopdone                    |    291    -3|    150289533       -68230
194t tracer                      |    322   +31|    153230990     +2941457
195t fre                         |    317    -5|    153230990
197t dom                         |    286   -31|    154448079     +1217089
199t threadfull                  |    293    +7|    154724763      +276684
200t vrp                         |    297    +4|    155042448      +317685
204t dce                         |    294    -3|    155017073       -25375
206t sink                        |    292    -2|    155017073
211t cddce                       |    298    +6|    155018657        +1584
255t optimized                   |    296    -2|    155018657
256r expand                      |    273   -23|    154592622      -426035
258r into_cfglayout              |    268    -5|    154592661          +39
275r loop2_unroll                |    272    +4|    159701866     +5109205
291r ce2                         |    270    -2|    159723509
312r pro_and_epilogue            |    290   +20|    159792505       +68996
315r jump2                       |    296    +6|    164234016     +4441511
323r bbro                        |    294    -2|    159385430     -4848586

So ch introduces 10 new mismatches while originally it did 308.  At bbro the
number of mismatches dropped from 432 to 294.
Most offender is now cunroll pass. I think it is the case where loop has multiple
exits and one of exits becomes to be false in all but last peeled iteration.

This is another case where non-trivial loop update is needed.

Honza

gcc/ChangeLog:

* tree-cfg.cc (gimple_duplicate_sese_region): Add elliminated_edge
parmaeter; update profile.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototype.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Rename to ...
(static_loop_exit): ... this; return the edge to be elliminated.
(ch_base::copy_headers): Handle profile updating for eliminated exits.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ifc-20040816-1.c: Reduce number of mismatches
from 2 to 1.
* gcc.dg/tree-ssa/loop-ch-profile-1.c: New test.
* gcc.dg/tree-ssa/loop-ch-profile-2.c: New test.

i386: Add STV support for DImode and SImode rotations by constant.

This patch implements scalar-to-vector (STV) support for DImode and SImode
rotations by constant bit counts.  Scalar rotations are almost always
optimal on x86, requiring only one or two instructions, but it is also
possible to implement these efficiently with SSE2, requiring only one
or two instructions for SImode rotations and at most 3 instructions for
DImode rotations.  This allows GCC to STV rotations with a small or no
penalty if there are other (net) benefits to converting a chain.  An
example of the benefits is shown below, which is based upon the BLAKE2
cryptographic hash function:

unsigned long long a,b,c,d;

unsigned long rot(unsigned long long x, int y)
{
  return (x<<y) | (x>>(64-y));
}

void foo()
{
  d = rot(d ^ a,32);
  c = c + d;
  b = rot(b ^ c,24);
  a = a + b;
  d = rot(d ^ a,16);
  c = c + d;
  b = rot(b ^ c,63);
}

where with -m32 -O2 -msse2

Before (59 insns, 247 bytes):

foo: pushl   %edi
        xorl    %edx, %edx
        pushl   %esi
        pushl   %ebx
        subl    $16, %esp
        movq    a, %xmm1
        movq    d, %xmm0
        movq    b, %xmm2
        pxor    %xmm1, %xmm0
        psrlq   $32, %xmm0
        movd    %xmm0, %eax
        movd    %edx, %xmm0
        movd    %eax, %xmm3
        punpckldq       %xmm0, %xmm3
        movq    c, %xmm0
        paddq   %xmm3, %xmm0
        pxor    %xmm0, %xmm2
        movd    %xmm2, %ecx
        psrlq   $32, %xmm2
        movd    %xmm2, %ebx
        movl    %ecx, %eax
        shldl   $24, %ebx, %ecx
        shldl   $24, %eax, %ebx
        movd    %ebx, %xmm4
        movd    %ecx, %xmm2
        punpckldq       %xmm4, %xmm2
        movdqa  .LC0, %xmm4
        pand    %xmm4, %xmm2
        paddq   %xmm2, %xmm1
        movq    %xmm1, a
        pxor    %xmm3, %xmm1
        movd    %xmm1, %esi
        psrlq   $32, %xmm1
        movd    %xmm1, %edi
        movl    %esi, %eax
        shldl   $16, %edi, %esi
        shldl   $16, %eax, %edi
        movd    %esi, %xmm1
        movd    %edi, %xmm3
        punpckldq       %xmm3, %xmm1
        pand    %xmm4, %xmm1
        movq    %xmm1, d
        paddq   %xmm1, %xmm0
        movq    %xmm0, c
        pxor    %xmm2, %xmm0
        movd    %xmm0, 8(%esp)
        psrlq   $32, %xmm0
        movl    8(%esp), %eax
        movd    %xmm0, 12(%esp)
        movl    12(%esp), %edx
        shrdl   $1, %edx, %eax
        xorl    %edx, %edx
        movl    %eax, b
        movl    %edx, b+4
        addl    $16, %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        ret

After (32 insns, 165 bytes):
        movq    a, %xmm1
        xorl    %edx, %edx
        movq    d, %xmm0
        movq    b, %xmm2
        movdqa  .LC0, %xmm4
        pxor    %xmm1, %xmm0
        psrlq   $32, %xmm0
        movd    %xmm0, %eax
        movd    %edx, %xmm0
        movd    %eax, %xmm3
        punpckldq       %xmm0, %xmm3
        movq    c, %xmm0
        paddq   %xmm3, %xmm0
        pxor    %xmm0, %xmm2
        pshufd  $68, %xmm2, %xmm2
        psrldq  $5, %xmm2
        pand    %xmm4, %xmm2
        paddq   %xmm2, %xmm1
        movq    %xmm1, a
        pxor    %xmm3, %xmm1
        pshuflw $147, %xmm1, %xmm1
        pand    %xmm4, %xmm1
        movq    %xmm1, d
        paddq   %xmm1, %xmm0
        movq    %xmm0, c
        pxor    %xmm2, %xmm0
        pshufd  $20, %xmm0, %xmm0
        psrlq   $1, %xmm0
        pshufd  $136, %xmm0, %xmm0
        pand    %xmm4, %xmm0
        movq    %xmm0, b
        ret

2023-07-01  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
gains/costs for ROTATE and ROTATERT (by an integer constant).
(general_scalar_chain::convert_rotate): New helper function to
convert a DImode or SImode rotation by an integer constant into
SSE vector form.
(general_scalar_chain::convert_insn): Call the new convert_rotate
for ROTATE and ROTATERT.
(general_scalar_to_vector_candidate_p): Consider ROTATE and
ROTATERT to be candidates if the second operand is an integer
constant, valid for a rotation (or shift) in the given mode.
* config/i386/i386-features.h (general_scalar_chain): Add new
helper method convert_rotate.

gcc/testsuite/ChangeLog
* gcc.target/i386/rotate-6.c: New test case.
* gcc.target/i386/sse2-stv-1.c: Likewise.