git.ipfire.org Git - thirdparty/gcc.git/log

tree-ssa-math-opts: Use element_precision.

The recent TYPE_PRECISION changes to detect improper usage
cause an ICE in divmod_candidate_p for RVV when called with
a vector type. Therefore, use element_precision instead.

gcc/ChangeLog:

* tree-ssa-math-opts.cc (divmod_candidate_p): Use
element_precision.

[Committed] Add -mmove-max=128 -mstore-max=128 to pieces-memcmp-2.c

Adding -mmove-max=128 and -mstore-max=128 to the dg-options of the
recently added gcc.target/i386/pieces-memcmp-2.c avoids changing the
intent of this testcase when adding -march=cascadelake to RUNTESTFLAGS.
Committed as obvious.

2023-06-29 Roger Sayle <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: Specify that 128-bit
comparisons are desired, to see if 256-bit instructions are
generated inappropriately (fixes test on -march=cascadelake).

tree-optimization/110460 - fend off vector types from vectorizer

The following makes fending off existing vector types from vectorization
also apply to word_mode vector types. I've chosen to add a positive
list of allowed scalar types here for clarity.

PR tree-optimization/110460
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Only allow integral, pointer and scalar float type scalar_type.

Avoid adding loop-carried ops to long chains

Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-carried ops in a
separate chain.
   E.g.
   x_1 = phi(x_0, x_2)
   y_1 = phi(y_0, y_2)

   a + b + c + d + e + x1 + y1

   SSA1 = a + b;
   SSA2 = c + d;
   SSA3 = SSA1 + e;
   SSA4 = SSA3 + SSA2;
   SSA5 = x1 + y1;
   SSA6 = SSA4 + SSA5;

With the patch applied, these test cases improved by 32%~100%.

S242:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}

Case 1:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

Case 2:
for (int i = 1; i < LEN_1D; ++i) {
    a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

The value is the execution time
A: original version
B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
C: with current patch(base on B)

  A   B   C B/A         C/A
s242 2.859 5.152 2.859 1.802028681 1
case 1 5.489 5.488 3.511 0.999818 0.64
case 2 7.216 7.499 4.885 1.039218 0.68

gcc/ChangeLog:

PR tree-optimization/110148
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle loop-carried
ops in this function.

[testsuite] tolerate enabled but missing language frontends

When a language is enabled but we run the testsuite against a tree in
which the frontend compiler is not present, help.exp fails.  It
recognizes the output pattern for a disabled language, but not a
missing frontend.  Extend the pattern so that it covers both cases.

for  gcc/testsuite/ChangeLog

* lib/options.exp (check_for_options_with_filter): Handle
missing frontend compiler like disabled language.

middle-end/110452 - bad code generation with AVX512 mask splat

The following adds an alternate way of expanding a uniform
mask vector constructor like

  _55 = _2 ? -1 : 0;
  vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55};

when the mask mode is a scalar int mode like for AVX512 or GCN.
Instead of piecewise building the result via shifts and ors
we can take advantage of uniformity and signedness of the
component and simply sign-extend to the result.

Instead of

        cmpl    $3, %edi
        sete    %cl
        movl    %ecx, %esi
        leal    (%rsi,%rsi), %eax
        leal    0(,%rsi,4), %r9d
        leal    0(,%rsi,8), %r8d
        orl     %esi, %eax
        orl     %r9d, %eax
        movl    %ecx, %r9d
        orl     %r8d, %eax
        movl    %ecx, %r8d
        sall    $4, %r9d
        sall    $5, %r8d
        sall    $6, %esi
        orl     %r9d, %eax
        orl     %r8d, %eax
        movl    %ecx, %r8d
        orl     %esi, %eax
        sall    $7, %r8d
        orl     %r8d, %eax
        kmovb   %eax, %k1

we then get

        cmpl    $3, %edi
        sete    %cl
negl    %ecx
kmovb   %ecx, %k1

Code generation for non-uniform masks remains bad, but at least
I see no easy way out for the most general case here.

PR middle-end/110452
* expr.cc (store_constructor): Handle uniform boolean
vectors with integer mode specially.

middle-end/110461 - pattern applying wrongly to vectors

The following guards a match.pd pattern that wasn't supposed to
apply to vectors and thus runs into TYPE_PRECISION checking. For
vector support the constant case is lacking and the pattern would
have missing optab support checking for the result operation.

PR middle-end/110461
* match.pd (bitop (convert@2 @0) (convert?@3 @1)): Disable
for VECTOR_TYPE_P.

* gcc.dg/pr110461.c: New testcase.

c/110454 - ICE with bogus TYPE_PRECISION use

The following sinks TYPE_PRECISION to properly guarded use places.

PR c/110454
gcc/c/
* c-typeck.cc (convert_argument): Sink formal_prec compute
to where TYPE_PRECISION is valid to use.

gcc/testsuite/
* gcc.dg/Wtraditional-conversion-3.c: New testcase.

A couple of va_gc_atomic tweaks

The only current user of va_gc_atomic is Ada's:

    vec<Entity_Id, va_gc_atomic>

It uses the generic gt_pch_nx routines (with gt_pch_nx being the
“note pointers” hooks), such as:

    template<typename T, typename A>
    void
    gt_pch_nx (vec<T, A, vl_embed> *v)
    {
      extern void gt_pch_nx (T &);
      for (unsigned i = 0; i < v->length (); i++)
gt_pch_nx ((*v)[i]);
    }

It then defines gt_pch_nx routines for Entity_Id &.

The problem is that if we wanted to take the same approach for
an array of unsigned ints, we'd need to define:

    inline void gt_pch_nx (unsigned int &) { }

which would then be ambiguous with:

    inline void gt_pch_nx (unsigned int) { }

The point of va_gc_atomic is that the elements don't need to be GCed,
and so we have:

    template<typename T>
    void
    gt_ggc_mx (vec<T, va_gc_atomic, vl_embed> *v ATTRIBUTE_UNUSED)
    {
      /* Nothing to do.  Vectors of atomic types wrt GC do not need to
be traversed.  */
    }

I think it's therefore reasonable to assume that no pointers will
need to be processed for PCH either.

The patch also relaxes the array_slice constructor for vec<T, va_gc> *
so that it handles all embedded vectors.

gcc/
* vec.h (gt_pch_nx): Add overloads for va_gc_atomic.
(array_slice): Relax va_gc constructor to handle all vectors
with a vl_embed layout.

gcc/ada/
* gcc-interface/decl.cc (gt_pch_nx): Remove overloads for Entity_Id.

RISC-V: Support vfadd static rounding mode by mode switching

This patch would like to support the vfadd static round mode similar to
the fixed-point. Then the related fsrm instructions will be inserted
correlatively.

Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
it will be implemented in the underlying PATCH(s).

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
(riscv_mode_needed): Likewise.
(riscv_entity_mode_after): Likewise.
(riscv_mode_after): Likewise.
(riscv_mode_entry): Likewise.
(riscv_mode_exit): Likewise.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
for FRM.
* config/riscv/riscv.md: Add FRM register.
* config/riscv/vector-iterators.md: Add FRM type.
* config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
(fsrm): Define new insn for fsrm instruction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.

RISC-V: Allow rounding mode control for RVV floating-point add

According to the doc as below, we need to support the rounding mode of
the RVV floating-point, both the static and dynamice frm.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

For tracking and development friendly, We will take some steps to support
all rounding modes for the RVV floating-point rounding modes.

1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
2. Support static rounding mode control by mode switch, like fixed-point.
3. Support dynamice round mode control by mode switch.
4. Support the rest floating-point instructions for frm.

Please *NOTE* this patch only allow the rounding mode control for the
vfadd intrinsic API, and the related frm will be coverred by step 2.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
Add macro for static frm min and max.
* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): New class for floating-point with frm.
(BASE): Add vfadd for frm.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfadd_frm): Likewise.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): New struct for alu with frm.
(SHAPE): Add alu with frm.
* config/riscv/riscv-vector-builtins-shapes.h: Likewise.
* config/riscv/riscv-vector-builtins.cc
(function_checker::report_out_of_range_and_not): New function
for report out of range and not val.
(function_checker::require_immediate_range_or): New function
for checking in range or one val.
* config/riscv/riscv-vector-builtins.h: Add function decl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm.c: New test.

x86: Update model values for Alderlake, Rocketlake and Raptorlake.

Update model values for Alderlake, Rocketlake and Raptorlake according to SDM.

gcc/ChangeLog

* common/config/i386/cpuinfo.h (get_intel_cpu): Remove model value 0xa8
from Rocketlake, move model value 0xbf from Alderlake to Raptorlake.

Fix collection and processing of autoprofile data for target libs

cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
debug info since they are used to build target libs. -gtoggle was
turning off debug info for this stage.

create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
processing profile data collected while building target libraries.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Remove -gtoggle for STAGEautoprofile
* Makefile.tpl: Remove -gtoggle for STAGEautoprofile

gcc/c/ChangeLog:

* Make-lang.in: Pass correct stage cc1 when processing
profile data collected while building target libraries

gcc/cp/ChangeLog:

* Make-lang.in: Pass correct stage cc1plus when processing
profile data collected while building target libraries

gcc/lto/ChangeLog:

* Make-lang.in: Pass correct stage lto when processing
profile data collected while building target libraries

Daily bump.

testsuite: check_effective_target_lra: CRIS is LRA

Left-over from r14-383-gfaf8bea79b6256.

* lib/target-supports.exp (check_effective_target_lra): Remove
cris-*-* from expression for exceptions to LRA.

CRIS: Don't apply PATTERN to insn before validation (PR 110144)

Oops. The validation was there, but PATTERN was applied
before that. Noticeable only with rtl-checking (for example
as in the report: "--enable-checking=yes,rtl") as this
statement was only a (one of many) straggling olde-C
declare-and-initialize-at-beginning-of-block thing.

PR target/110144
* config/cris/cris.cc (cris_postdbr_cmpelim): Don't apply PATTERN
to insn before validating it.

Enable early inlining into always_inline functions

Early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder. This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite common
in libstdc++.

This patch instead of fully disabling the inline checks calls in callee.
I am quite conservative about what can be inlined as this patch is bit
touchy anyway. To avoid problems with always_inline being optimized
after early inline I extended inline_always_inline_functions to lazilly
compute fnsummary when needed.

gcc/ChangeLog:

PR middle-end/110334
* ipa-fnsummary.h (ipa_fn_summary): Add
safe_to_inline_to_always_inline.
* ipa-inline.cc (can_early_inline_edge_p): ICE
if SSA is not built; do cycle checking for
always_inline functions.
(inline_always_inline_functions): Be recrusive;
watch for cycles; do not updat overall summary.
(early_inliner): Do not give up on always_inlines.
* ipa-utils.cc (ipa_reverse_postorder): Do not skip
always inlines.

gcc/testsuite/ChangeLog:

PR middle-end/110334
* g++.dg/opt/pr66119.C: Disable early inlining.
* gcc.c-torture/compile/pr110334.c: New test.
* gcc.dg/tree-ssa/pr110334.c: New test.

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): For non-constant string
argument passed to CHARACTER(LEN=1),VALUE dummy, ensure proper
dereferencing and truncation of string to length 1.

gcc/testsuite/ChangeLog:

PR fortran/110360
* gfortran.dg/value_9.f90: Add tests for intermediate regression.

c++: ahead of time variable template-id coercion [PR89442]

This patch makes us coerce the arguments of a variable template-id ahead
of time, as we do for class template-ids, which causes us to immediately
diagnose template parm/arg kind mismatches and arity mismatches.

Unfortunately this causes a regression in cpp1z/constexpr-if20.C: coercing
the variable template-id m<ar, as> ahead of time means we strip it of
typedefs, yielding m<typename C<i>::q, typename C<j>::q>, but in this
stripped form we're directly using 'i' and so we expect to have captured
it. This is a variable template version of PR107437.

PR c++/89442
PR c++/107437

gcc/cp/ChangeLog:

* cp-tree.h (lookup_template_variable): Add complain parameter.
* parser.cc (cp_parser_template_id): Pass tf_warning_or_error
to lookup_template_variable.
* pt.cc (lookup_template_variable): Add complain parameter.
Coerce template arguments here ...
(finish_template_variable): ... instead of here.
(lookup_and_finish_template_variable): Check for error_mark_node
result from lookup_template_variable.
(tsubst_copy) <case TEMPLATE_ID_EXPR>: Pass complain to
lookup_template_variable.
(instantiate_template): Use build2 instead of
lookup_template_variable to build a TEMPLATE_ID_EXPR
for most_specialized_partial_spec.

gcc/testsuite/ChangeLog:

* g++.dg/cpp/pr64127.C: Expect "expected unqualified-id at end
of input" error.
* g++.dg/cpp0x/alias-decl-ttp1.C: Fix template parameter/argument
kind mismatch for variable template has_P_match_V.
* g++.dg/cpp1y/pr72759.C: Expect "template argument 1 is invalid"
error.
* g++.dg/cpp1z/constexpr-if20.C: XFAIL test due to bogus "'i' is
not captured" error.
* g++.dg/cpp1z/noexcept-type21.C: Fix arity of variable template d.
* g++.dg/diagnostic/not-a-function-template-1.C: Add default
template argument to variable template A so that A<> is valid.
* g++.dg/parse/error56.C: Don't expect "ISO C++ forbids
declaration with no type" error.
* g++.dg/parse/template30.C: Don't expect "parse error in
template argument list" error.
* g++.dg/cpp1y/var-templ82.C: New test.

d: Fix wrong code-gen when returning structs by value.

Since r13-1104, structs have have compute_record_mode called too early
on them, causing them to return differently depending on the order that
types are generated in, and whether there are forward references.

This patch moves the call to compute_record_mode into its own function,
and calls it after all fields have been given a size.

PR d/106977
PR target/110406

gcc/d/ChangeLog:

* types.cc (finish_aggregate_mode): New function.
(finish_incomplete_fields): Call finish_aggregate_mode.
(finish_aggregate_type): Replace call to compute_record_mode with
finish_aggregate_mode.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr110406.d: New test.

d: Fix d_signed_or_unsigned_type is invoked for vector types (PR110193)

This function can be invoked on VECTOR_TYPE, but the implementation
assumes it works on integer types only. To fix, added a check whether
the type passed is any `__vector(T)' or non-integral type, and return
early by calling `signed_or_unsigned_type_for()' instead.

Problem was found by instrumenting TYPE_PRECISION and ICEing when
applied on VECTOR_TYPEs.

PR d/110193

gcc/d/ChangeLog:

* types.cc (d_signed_or_unsigned_type): Handle being called with any
vector or non-integral type.

c++: fix error reporting routines re-entered ICE [PR110175]

Here we get the "error reporting routines re-entered" ICE because
of an unguarded use of warning_at. While at it, I added a check
for a warning_at just above it.

PR c++/110175

gcc/cp/ChangeLog:

* typeck.cc (cp_build_unary_op): Check tf_warning before warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-110175.C: New test.

final+varasm: Change return type of predicate functions from int to bool

Also change some internal variables to bool and change return type of
compute_alignments to void.

gcc/ChangeLog:

* output.h (leaf_function_p): Change return type from int to bool.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
(maybe_assemble_visibility): Ditto.
* varasm.h (supports_one_only): Ditto.
* rtl.h (compute_alignments): Change return type from int to void.
* final.cc (app_on): Change return type from int to bool.
(compute_alignments): Change return type from int to void
and adjust function body accordingly.
(shorten_branches): Change "something_changed" variable
type from int to bool.
(leaf_function_p): Change return type from int to bool
and adjust function body accordingly.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
* varasm.cc (contains_pointers_p): Change return type from
int to bool and adjust function body accordingly.
(compare_constant): Ditto.
(maybe_assemble_visibility): Ditto.
(supports_one_only): Ditto.

cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

Fixes: 6a2e8dcbbd4bab3
Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR debug/110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>

tree-optimization/110434 - avoid <retval> ={v} {CLOBBER} from NRV

When NRV replaces a local variable with <retval> it also replaces
occurences in clobbers. This leads to <retval> being clobbered
before the return of it which is strictly invalid but harmless in
practice since there's no pass after NRV which would remove
earlier stores.

The following fixes this nevertheless.

PR tree-optimization/110434
* tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
VAR we replace with <retval>.

Make mve_fp_fpu[12].c accept single or double precision FPU

This tests currently expect a directive containing .fpu fpv5-sp-d16
and thus may fail if the test is executed for instance with
-march=armv8.1-m.main+mve.fp+fp.dp

This patch accepts either fpv5-sp-d16 or fpv5-d16 to avoid the failure.

2023-06-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c: Fix .fpu
scan-assembler.
* gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c: Likewise.

Make nomve_fp_1.c require arm_fp

If GCC is configured with the default (soft) -mfloat-abi, and we don't
override the target_board test flags appropriately,
gcc.target/arm/mve/general-c/nomve_fp_1.c fails for lack of
-mfloat-abi=softfp or -mfloat-abi=hard, because it doesn't use
dg-add-options arm_v8_1m_mve (on purpose, see comment in the test).

Require and use the options needed for arm_fp to fix this problem.

2023-06-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/general-c/nomve_fp_1.c: Require arm_fp.

tree-optimization/110451 - hoist invariant compare after interchange

The following adjusts the cost model of invariant motion to consider
[VEC_]COND_EXPRs and comparisons producing a data value as expensive.
For 503.bwaves_r this avoids an unnecessarily high vectorization
factor because of an integer comparison besides data operations on
double.

PR tree-optimization/110451
* tree-ssa-loop-im.cc (stmt_cost): [VEC_]COND_EXPR and
tcc_comparison are expensive.

* gfortran.dg/vect/pr110451.f: New testcase.

Fortran: Enable class expressions in structure constructors [PR49213]

2023-06-28 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Remove reference to class_pointer.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.

gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test

i386: Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).

This patch fixes some very odd (unanticipated) code generation by
compare_by_pieces with -m32 -mavx, since the recent addition of the
cbranchoi4 pattern.  The issue is that cbranchoi4 is available with
TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
-m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
comparisons to 256-bits before performing PTEST.

This patch fixes this by providing a cbranchti4 pattern that's available
with either TARGET_64BIT or TARGET_SSE4_1.

For the test case below (again from PR 104610):

int foo(char *a)
{
    static const char t[] = "0123456789012345678901234567890";
    return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
}

GCC with -m32 -O2 -mavx currently produces the bonkers:

foo:    pushl   %ebp
        movl    %esp, %ebp
        andl    $-32, %esp
        subl    $64, %esp
        movl    8(%ebp), %eax
        vmovdqa .LC0, %xmm4
        movl    $0, 48(%esp)
        vmovdqu (%eax), %xmm2
        movl    $0, 52(%esp)
        movl    $0, 56(%esp)
        movl    $0, 60(%esp)
        movl    $0, 16(%esp)
        movl    $0, 20(%esp)
        movl    $0, 24(%esp)
        movl    $0, 28(%esp)
        vmovdqa %xmm2, 32(%esp)
        vmovdqa %xmm4, (%esp)
        vmovdqa (%esp), %ymm5
        vpxor   32(%esp), %ymm5, %ymm0
        vptest  %ymm0, %ymm0
        jne     .L2
        vmovdqu 16(%eax), %xmm7
        movl    $0, 48(%esp)
        movl    $0, 52(%esp)
        vmovdqa %xmm7, 32(%esp)
        vmovdqa .LC1, %xmm7
        movl    $0, 56(%esp)
        movl    $0, 60(%esp)
        movl    $0, 16(%esp)
        movl    $0, 20(%esp)
        movl    $0, 24(%esp)
        movl    $0, 28(%esp)
        vmovdqa %xmm7, (%esp)
        vmovdqa (%esp), %ymm1
        vpxor   32(%esp), %ymm1, %ymm0
        vptest  %ymm0, %ymm0
        je      .L6
.L2: movl    $1, %eax
        xorl    $1, %eax
        vzeroupper
        leave
        ret
.L6: xorl    %eax, %eax
        xorl    $1, %eax
        vzeroupper
        leave
        ret

with this patch, we now generate the (slightly) more sensible:

foo: vmovdqa .LC0, %xmm0
        movl    4(%esp), %eax
        vpxor   (%eax), %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        jne     .L2
        vmovdqa .LC1, %xmm0
        vpxor   16(%eax), %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        je      .L5
.L2: movl    $1, %eax
        xorl    $1, %eax
        ret
.L5: xorl    %eax, %eax
        xorl    $1, %eax
        ret

2023-06-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
for TImode comparisons on 32-bit architectures.
* config/i386/i386.md (cbranch<mode>4): Change from SDWIM to
SWIM1248x to exclude/avoid TImode being conditional on -m64.
(cbranchti4): New define_expand for TImode on both TARGET_64BIT
and/or with TARGET_SSE4_1.
* config/i386/predicates.md (ix86_timode_comparison_operator):
New predicate that depends upon TARGET_64BIT.
(ix86_timode_comparison_operand): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: New test case.

i386: Fix FAIL of gcc.target/i386/pr78794.c on ia32.

This patch fixes that FAIL of gcc.target/i386/pr78794.c on ia32, which
is caused by minor STV rtx_cost differences with -march=silvermont.
It turns out that generic tuning results in pandn, but the lack of
accurate parameterization for COMPARE in compute_convert_gain combined
with small differences in scalar<->SSE costs on silvermont results in
this DImode chain not being converted.

The solution is to provide more accurate costs/gains for converting
(DImode and SImode) comparisons.

I'd been holding off of doing this as I'd thought it would be possible
to turn pandn;ptestz into ptestc (for an even bigger scalar-to-vector
win) but I've recently realized that these optimizations (as I've
implemented them) occur in the wrong order (stv2 occurs after
combine), so it isn't easy for STV to convert CCZmode into CCCmode.
Doh! Perhaps something can be done in peephole2.

2023-06-28 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/78794
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate gains for conversion of scalar comparisons to
PTEST.

Add cold attribute to throw wrappers and terminate

PR middle-end/109849
* include/bits/c++config (std::__terminate): Mark cold.
* include/bits/functexcept.h: Mark everything as cold.
* libsupc++/exception: Mark terminate and unexpected as cold.

tree-optimization/110443 - prevent SLP splat of gathers

The following prevents non-grouped load SLP in case the element
to splat is from a gather operation. While it should be possible
to support this it is not similar to the single element interleaving
case I was trying to mimic here.

PR tree-optimization/110443
* tree-vect-slp.cc (vect_build_slp_tree_1): Reject non-grouped
gather loads.

* gcc.dg/torture/pr110443.c: New testcase.

rs6000: Add two peephole patterns for "mr." insn

When investigating the issue mentioned in PR87871#c30 - if compare
and move pattern benefits before RA, I checked the assembly generated
for SPEC2017 and found that certain insn sequences aren't converted to
"mr." instructions.
Following two sequence are never to be combined to "mr." pattern as
there is no register link between them. This patch adds two peephole2
patterns to convert them to "mr." instructions.

cmp 0,3,0
mr 4,3

mr 4,3
cmp 0,3,0

The patch also creates a new mode iterator which decided by
TARGET_POWERPC64.  This mode iterator is used in "mr." and its split
pattern.  The original P iterator is improper when -m32/-mpowerpc64 is
set.  In this situation, the "mr." should compares the whole 64-bit
register with 0 other than the low 32-bit one.

gcc/
* config/rs6000/rs6000.md (peephole2 for compare_and_move): New.
(peephole2 for move_and_compare): New.
(mode_iterator WORD): New.  Set the mode to SI/DImode by
TARGET_POWERPC64.
(*mov<mode>_internal2): Change the mode iterator from P to WORD.
(split pattern for compare_and_move): Likewise.

gcc/testsuite/
* gcc.dg/rtl/powerpc/move_compare_peephole_32.c: New.
* gcc.dg/rtl/powerpc/move_compare_peephole_64.c: New.

RISC-V: Support vfwmacc combine lowering

This patch adds combine pattern as follows:

1. (set (reg) (fma (float_extend:reg)(float_extend:reg)(reg)))
   This pattern allows combine: vfwcvt + vfwcvt + vfmacc ==> vwfmacc.

2. (set (reg) (fma (float_extend:reg)(reg)(reg)))
   This pattern is the intermediate IR that enhances the combine optimizations.
   Since for the complicate situation, combine pass can not combine both operands
   of multiplication at the first time, it will try to first combine at the first
   stage: (set (reg) (fma (float_extend:reg)(reg)(reg))). Then combine another
   extension of the other operand at the second stage.

   This can enhance combine optimization for the following case:

define TEST_TYPE(TYPE1, TYPE2)                                                 \
    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                       \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
dst[i] += (TYPE1) a[i] * (TYPE1) b[i];                                 \
dst2[i] += (TYPE1) a2[i] * (TYPE1) b[i];                               \
dst3[i] += (TYPE1) a2[i] * (TYPE1) a[i];                               \
dst4[i] += (TYPE1) a[i] * (TYPE1) b2[i];                               \
      }                                                                        \
  }

define TEST_ALL()                                                              \
  TEST_TYPE (int16_t, int8_t)                                                  \
  TEST_TYPE (uint16_t, uint8_t)                                                \
  TEST_TYPE (int32_t, int16_t)                                                 \
  TEST_TYPE (uint32_t, uint16_t)                                               \
  TEST_TYPE (int64_t, int32_t)                                                 \
  TEST_TYPE (uint64_t, uint32_t)                                               \
  TEST_TYPE (float, _Float16)                                                  \
  TEST_TYPE (double, float)

TEST_ALL ()

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fma<mode>): New pattern.
(*single_widen_fma<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-8.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-8.c: New test.

rs6000: Splat vector small V2DI constants with vspltisw and vupkhsw

This patch adds a new insn for vector splat with small V2DI constants on P8.
If the value of constant is in RANGE (-16, 15) but not 0 or -1, it can be
loaded with vspltisw and vupkhsw on P8.

gcc/
PR target/104124
* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Rename
to...
(altivec_vupkhs<VU_char>_direct): ...this.
* config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New
predicate to test if a constant can be loaded with vspltisw and
vupkhsw.
(easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if
a vector constant can be synthesized with a vspltisw and a vupkhsw.
* config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p):
Declare.
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New
function to return true if OP mode is V2DI and can be synthesized
with vupkhsw and vspltisw.
* config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
constants with vspltisw and vupkhsw.

gcc/testsuite/
PR target/104124
* gcc.target/powerpc/pr104124.c: New.

Enable ranger for ipa-prop

gcc/ChangeLog:

PR tree-optimization/110377
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Pass statement to
the ranger query.
(ipa_analyze_node): Enable ranger.

gcc/testsuite/ChangeLog:

PR tree-optimization/110377
* gcc.dg/ipa/pr110377.c: New test.

Add testcase for PR 110444

This testcase was fixed after r14-2135-gd915762ea9043da85 and
there was no testcase for it before so adding one is a good thing.

Committed as obvious after testing the testcase to make sure it works.

gcc/testsuite/ChangeLog:

PR tree-optimization/110444
* gcc.c-torture/compile/pr110444-1.c: New test.

Prevent TYPE_PRECISION on VECTOR_TYPEs

The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled. This should avoid wrong-code
in cases like PR110182 and instead ICE.

It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.

gcc/lto/
* lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.

c++: inherited constructor attributes

Inherited constructors are like constructor clones; they don't exist from
the language perspective, so they should copy the attributes in the same
way. But it doesn't make sense to copy alias or ifunc attributes in either
case. Unlike handle_copy_attribute, we do want to copy inlining attributes.

The discussion of PR110334 pointed out that we weren't copying the
always_inline attribute, leading to poor inlining choices.

PR c++/110334

gcc/cp/ChangeLog:

* cp-tree.h (clone_attrs): Declare.
* method.cc (implicitly_declare_fn): Use it for inherited
constructor.
* optimize.cc (clone_attrs): New.
(maybe_clone_body): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nodiscard-inh1.C: New test.

Add leafy mode for zero-call-used-regs

Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
nonleaf functions, respectively.

for gcc/ChangeLog

* doc/extend.texi (zero-call-used-regs): Document leafy and
variants thereof.
* flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
LEAFY and variants.
* function.cc (gen_call_ued_regs_seq): Set only_used for leaf
functions in leafy mode.
* opts.cc (zero_call_used_regs_opts): Add leafy and variants.

for gcc/testsuite/ChangeLog

* c-c++-common/zero-scratch-regs-leafy-1.c: New.
* c-c++-common/zero-scratch-regs-leafy-2.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-2.c: New.

[testsuite] note pitfall in how outputs.exp sets gld

This patch documents a glitch in gcc.misc-tests/outputs.exp: it checks
whether the linker is GNU ld, and uses that to decide whether to
expect collect2 to create .ld1_args files under -save-temps, but
collect2 bases that decision on whether HAVE_GNU_LD is set, which may
be false zero if the linker in use is GNU ld.  Configuring
--with-gnu-ld fixes this misalignment.  Without that, atsave tests are
likely to fail, because without HAVE_GNU_LD, collect2 won't use @file
syntax to run the linker (so it won't create .ld1_args files).

Long version: HAVE_GNU_LD is set when (i) DEFAULT_LINKER is set during
configure, pointing at GNU ld; (ii) --with-gnu-ld is passed to
configure; or (iii) config.gcc sets gnu_ld=yes.  If a port doesn't set
gnu_ld, and the toolchain isn't configured so as to assume GNU ld,
configure and thus collect2 conservatively assume the linker doesn't
support @file arguments.

But outputs.exp can't see how configure set HAVE_GNU_LD (it may be
used to test an installed compiler), and upon finding that the linker
used by the compiler is GNU ld, it will expect collect2 to use @file
arguments when running the linker.  If that assumption doesn't hold,
atsave tests will fail.

for  gcc/testsuite/ChangeLog

* gcc.misc-tests/outputs.exp (gld): Note a known mismatch and
record a workaround.

c++: C++26 constexpr cast from void* [PR110344]

P2768 allows static_cast from void* to ob* in constant evaluation if the
pointer does in fact point to an object of the appropriate type.
cxx_fold_indirect_ref already does the work of finding such an object if it
happens to be a subobject rather than the outermost object at that address,
as in constexpr-voidptr2.C.

P2768
PR c++/110344

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update __cpp_constexpr.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): In C++26, allow cast
from void* to the type of a pointed-to object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/constexpr-voidptr1.C: New test.
* g++.dg/cpp26/constexpr-voidptr2.C: New test.
* g++.dg/cpp26/feat-cxx26.C: New test.

testsuite: std_list handling for { target c++26 }

As with c++23, we want to run { target c++26 } tests even though it isn't
part of the default std_list.

C++17 with Concepts TS is no longer an interesting target configuration.

And bump the impcx target to use C++26 mode instead of 23.

gcc/testsuite/ChangeLog:

* lib/g++-dg.exp (g++-dg-runtest): Update for C++26.

RISC-V: Support floating-point vfwadd/vfwsub vv/wv combine lowering

Currently, vfwadd.wv is the pattern with (set (reg) (float_extend:(reg)) which makes
combine pass faile to combine.

change RTL format of vfwadd.wv ------> (set (float_extend:(reg) (reg)) so that combine
PASS can combine.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adapt expand.
* config/riscv/vector.md (@pred_single_widen_<plus_minus:optab><mode>):
Remove.
(@pred_single_widen_add<mode>): New pattern.
(@pred_single_widen_sub<mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-1.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c: New test.

i386: Fix mvc17.c test for default target clone under --with-arch

For target clones, the default clone follows the default march
so adjust the testcase to avoid test failure on --with-arch=native
build.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mvc17.c: Add -march=x86-64 to dg-options.

Issue a warning for conversion between short and __bf16 under TARGET_AVX512BF16.

__bfloat16 is redefined from typedef short to real __bf16 since GCC
V13. The patch issues an warning for potential silent implicit
conversion between __bf16 and short where users may only expect a
data movement.

To avoid too many false positive, warning is only under
TARGET_AVX512BF16.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_invalid_conversion): New function.
(TARGET_INVALID_CONVERSION): Define as
ix86_invalid_conversion.

gcc/testsuite/ChangeLog:

* gcc.target/i386/bf16_short_warn.c: New test.

Daily bump.

RISC-V: Add autovect widening/narrowing Integer/FP conversions.

This patch implements widening and narrowing float-to-int and
int-to-float conversions and adds tests.

gcc/ChangeLog:

* config/riscv/autovec.md (<optab><vnconvert><mode>2): New
expander.
(<float_cvt><vnconvert><mode>2): Ditto.
(<optab><mode><vnconvert>2): Ditto.
(<float_cvt><mode><vnconvert>2): Ditto.
* config/riscv/vector-iterators.md: Add vnconvert.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: New test.

RISC-V: Add autovec FP widening/narrowing.

This patch adds FP widening and narrowing expanders as well as tests.
Conceptually similar to integer extension/truncation, we emulate
_Float16 -> double by two vfwcvts and double -> _Float16 by two vfncvts.

gcc/ChangeLog:

* config/riscv/autovec.md (extend<v_double_trunc><mode>2): New
expander.
(extend<v_quad_trunc><mode>2): Ditto.
(trunc<mode><v_double_trunc>2): Ditto.
(trunc<mode><v_quad_trunc>2): Ditto.
* config/riscv/vector-iterators.md: Add VQEXTF and HF to
V_QUAD_TRUNC and v_quad_trunc.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c: New test.

RISC-V: Add autovec FP int->float conversion.

This patch adds the autovec expander for vfcvt.f.x.v and tests for it.

gcc/ChangeLog:

* config/riscv/autovec.md (<float_cvt><vconvert><mode>2): New
expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vsext-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vzext-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add int/float conversions.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c: New test.

RISC-V: Implement autovec copysign.

This adds vector copysign, ncopysign and xorsign as well as the
accompanying tests.

gcc/ChangeLog:

* config/riscv/autovec.md (copysign<mode>3): Add expander.
(xorsign<mode>3): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc (class vfsgnjn):
New class.
* config/riscv/vector-iterators.md (copysign): Remove ncopysign.
(xorsign): Ditto.
(n): Ditto.
(x): Ditto.
* config/riscv/vector.md (@pred_ncopysign<mode>): Split off.
(@pred_ncopysign<mode>_scalar): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c: New test.

RISC-V: Split VF iterators for Zvfh(min).

When working on FP widening/narrowing I realized the Zvfhmin handling
is not ideal right now:  We use the "enabled" insn attribute to disable
instructions not available with Zvfhmin but only with Zvfh.

However, "enabled == 0" only disables insn alternatives, in our case all
of them when the mode is a HFmode.  The insn itself remains available
(e.g. for combine to match) and we end up with an insn without alternatives
that reload cannot handle --> ICE.

The proper solution is to disable the instruction for the respective
mode altogether.  This patch achieves this by splitting the VF as well
as VWEXTF iterators into variants with TARGET_ZVFH and
TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or
TARGET_ZVFHMIN are true).  Also, VWCONVERTI, VHF and VHF_LMUL1 need
adjustments.

gcc/ChangeLog:

* config/riscv/autovec.md: VF_AUTO -> VF.
* config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN,
VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and
VHF_LMUL1.
* config/riscv/vector.md: Use new iterators.

match.pd: Use element_mode instead of TYPE_MODE.

This patch changes TYPE_MODE into element_mode in a match.pd
simplification.  As the simplification can be also called with vector
types real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which
expects a scalar mode.  Therefore, use element_mode instead of
TYPE_MODE.

Additionally, check if the target supports the resulting operation.  One
target that supports e.g. a float addition but not a _Float16 addition
is the RISC-V vector extension Zvfhmin.

gcc/ChangeLog:

* match.pd: Use element_mode and check if target supports
operation with new type.

[SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant.

gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc
(svdupq_impl::fold_nonconst_dupq): New method.
(svdupq_impl::fold): Call fold_nonconst_dupq.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/dupq_11.c: New test.

Mark asm goto with outputs as volatile

The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/110420
PR middle-end/103979
PR middle-end/98619

gcc/ChangeLog:

* gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/asmgoto-6.c: New test.

ada: Fix build of GNAT tools

gcc/ada/

* gcc-interface/Makefile.in (LIBIBERTY): Fix condition.
(TOOLS_LIBS): Add @LD_PICFLAG@.

ada: Fix bad interaction between inlining and thunk generation

This may cause the type of the RESULT_DECL of a function which returns by
invisible reference to be turned into a reference type twice.

gcc/ada/

* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Add guard to the
code turning the type of the RESULT_DECL into a reference type.
(maybe_make_gnu_thunk): Use a more precise guard in the same case.

ada: Make the identification of case expressions more robust

gcc/ada/

* gcc-interface/trans.cc (Case_Statement_to_gnu): Rename boolean
constant and use From_Conditional_Expression flag for its value.

ada: Fix double finalization of case expression in concatenation

This streamlines the expansion of case expressions by not wrapping them in
an Expression_With_Actions node when the type is not by copy, which avoids
the creation of a temporary and the associated finalization issues.

That's the same strategy as the one used for the expansion of if expressions
when the type is by reference, unless Back_End_Handles_Limited_Types is set
to True. Given that it is never set to True, except by a debug switch, and
has never been implemented, this parameter is removed in the process.

gcc/ada/

* debug.adb (d.L): Remove documentation.
* exp_ch4.adb (Expand_N_Case_Expression): In the not-by-copy case,
do not wrap the case statement in an Expression_With_Actions node.
(Expand_N_If_Expression): Do not test
Back_End_Handles_Limited_Types
* gnat1drv.adb (Adjust_Global_Switches): Do not set it.
* opt.ads (Back_End_Handles_Limited_Types): Delete.

ada: Fix incorrect handling of iterator specifications in recent change

Unlike for loop parameter specifications where it references an index, the
defining identifier references an element in them.

gcc/ada/

* sem_ch12.adb (Check_Generic_Actuals): Check the component type
of constants and variables of an array type.
(Copy_Generic_Node): Fix bogus handling of iterator
specifications.

ada: Correct the contract of Ada.Text_IO.Get_Line

Item might not be entirely initialized after a call to Get_Line.

gcc/ada/

* libgnat/a-textio.ads (Get_Line): Use Relaxed_Initialization on
the Item parameter of Get_Line.

ada: Fix too late finalization and secondary stack release in iterator loops

Sem_Ch5 contains an entire machinery to deal with finalization actions and
secondary stack releases around iterator loops, so this removes a recent
fix that was made in a narrower case and instead refines the condition under
which this machinery is triggered.

As a side effect, given that finalization and secondary stack management are
still entangled in this machinery, this also fixes the counterpart of a leak
for the former, which is a finalization occurring too late.

gcc/ada/

* exp_ch4.adb (Expand_N_Quantified_Expression): Revert the latest
change as it is subsumed by the machinery in Sem_Ch5.
* sem_ch5.adb (Prepare_Iterator_Loop): Also wrap the loop
statement in a block in the name contains a function call that
returns on the secondary stack.

ada: Plug small loophole in the handling of private views in instances

This deals with nested instantiations in package bodies.

gcc/ada/

* sem_ch12.adb (Scope_Within_Body_Or_Same): New predicate.
(Check_Actual_Type): Take into account packages nested in bodies
to compute the enclosing scope by means of
Scope_Within_Body_Or_Same.

ada: Plug another loophole in the handling of private views in instances

This deals with discriminants of types declared in package bodies.

gcc/ada/

* sem_ch12.adb (Check_Private_View): Also check the type of
visible discriminants in record and concurrent types.

ada: Update printing container aggregates for debugging

All N_Aggregate nodes were printed with parentheses "()". However
the new container aggregates (homogeneous N_Aggregate nodes) should
be printed with brackets "[]".

gcc/ada/

* sprint.adb (Print_Node_Actual): Print homogeneous N_Aggregate
nodes with brackets.

ada: Fix expanding container aggregates

Ensure that that container aggregate expressions are expanded as
such and not as records even if the type of the expression is a
record.

gcc/ada/

* exp_aggr.adb (Expand_N_Aggregate): Ensure that container
aggregate expressions do not get expanded as records but instead
as container aggregates.

Convert remaining uses of value_range in ipa-*.cc to Value_Range.

Minor cleanups to get rid of value_range in IPA. There's only one left,
but it's in the switch code which is integer specific.

gcc/ChangeLog:

* ipa-cp.cc (decide_whether_version_node): Adjust comment.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
for Value_Range.
(set_switch_stmt_execution_predicate): Same.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.

Implement ipa_vr hashing.

Implement hashing for ipa_vr. When all is said and done, all these
patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
the similar 7% increase in this area last week. So we get type agnostic
ranges with "infinite" range precision close to free.

There is no change in overall compilation.

gcc/ChangeLog:

* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Adjust for use with
ipa_vr instead of value_range.
(gt_pch_nx): Same.
(gt_ggc_mx): Same.
(ipa_get_value_range): Same.
* value-range.cc (gt_pch_nx): Move to ipa-prop.cc and adjust for
ipa_vr.
(gt_ggc_mx): Same.

Convert ipa_jump_func to use ipa_vr instead of a value_range.

This patch converts the ipa_jump_func code to use the type agnostic
ipa_vr suitable for GC instead of value_range which is integer specific.

I've disabled the range cacheing to simplify the patch for review, but
it is handled in the next patch in the series.

gcc/ChangeLog:

* ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
* ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
(ipa_set_jfunc_vr): Take a range.
(ipa_compute_jump_functions_for_edge): Pass range to
ipa_set_jfunc_vr.
(ipa_write_jump_function): Call streamer write helper.
(ipa_read_jump_function): Call streamer read helper.
* ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.

gengtype: Handle braced initialisers in structs

I have a patch that adds braced initialisers to a GTY structure.
gengtype didn't accept that, because it parsed the "{ ... }" in
" = { ... };" as the end of a statement (as "{ ... }" would be in
a function definition) and so it didn't expect the following ";".

This patch explicitly handles initialiser-like sequences.

Arguably, the parser should also skip redundant ";", but that
feels more like a workaround rather than the real fix.

gcc/
* gengtype-parse.cc (consume_until_comma_or_eos): Parse "= { ... }"
as a probable initializer rather than a probable complete statement.

tree-optimization/96208 - SLP of non-grouped loads

The following extends SLP discovery to handle non-grouped loads
in loop vectorization in the case the same load appears in all
lanes.

Code generation is adjusted to mimick what we do for the case
of single element interleaving (when the load is not unit-stride)
which is already handled by SLP.  There are some limits we
run into because peeling for gap cannot cover all cases and
we choose VMAT_CONTIGUOUS.  The patch does not try to address
these issues yet.

The main obstacle is that these loads are not
STMT_VINFO_GROUPED_ACCESS and that's a new thing with SLP.
I know from the past that it's not a good idea to make them
grouped.  Instead the following massages places to deal
with SLP loads that are not STMT_VINFO_GROUPED_ACCESS.

There's already a testcase testing for the case the PR
is after, just XFAILed, the following adjusts that instead
of adding another.

I do expect to have missed some so I don't plan to push this
on a Friday.  Still there may be feedback, so posting this
now.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/96208
* tree-vect-slp.cc (vect_build_slp_tree_1): Allow
a non-grouped load if it is the same for all lanes.
(vect_build_slp_tree_2): Handle not grouped loads.
(vect_optimize_slp_pass::remove_redundant_permutations):
Likewise.
(vect_transform_slp_perm_load_1): Likewise.
* tree-vect-stmts.cc (vect_model_load_cost): Likewise.
(get_group_load_store_type): Likewise.  Handle
invariant accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/slp-46.c: Adjust for new vectorizations.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.

Refine maskstore patterns with UNSPEC_MASKMOV.

Similar like r14-2070-gc79476da46728e

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
it to be transformed to any other whole memory access instructions.

gcc/ChangeLog:

PR rtl-optimization/110237
* config/i386/sse.md (<avx512>_store<mode>_mask): Refine with
UNSPEC_MASKMOV.
(maskstore<mode><avx512fmaskmodelower): Ditto.
(*<avx512>_store<mode>_mask): New define_insn, it's renamed
from original <avx512>_store<mode>_mask.

Make option mvzeroupper independent of optimization level.

pass_insert_vzeroupper is under condition

TARGET_AVX && TARGET_VZEROUPPER
&& flag_expensive_optimizations && !optimize_size

But the document of mvzeroupper doesn't mention the insertion
required -O2 and above, it may confuse users when they explicitly
use -Os -mvzeroupper.

------------
mvzeroupper
Target Mask(VZEROUPPER) Save
Generate vzeroupper instruction before a transfer of control flow out of
the function.
------------

The patch moves flag_expensive_optimizations && !optimize_size to
ix86_option_override_internal. It makes -mvzeroupper independent of
optimization level, but still keeps the behavior of architecture
tuning(emit_vzeroupper) unchanged.

gcc/ChangeLog:

* config/i386/i386-features.cc (pass_insert_vzeroupper:gate):
Move flag_expensive_optimizations && !optimize_size to ..
* config/i386/i386-options.cc (ix86_option_override_internal):
.. this, it makes -mvzeroupper independent of optimization
level, but still keeps the behavior of architecture
tuning(emit_vzeroupper) unchanged.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-vzeroupper-29.c: New testcase.

Don't issue vzeroupper for vzeroupper call_insn.

gcc/ChangeLog:

PR target/82735
* config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit
vzeroupper for vzeroupper call_insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-vzeroupper-30.c: New test.

Fix __builtin_alloca_with_align_and_max defbuiltin usage

There is a missing space between the return type and the name
which causes the name not to be outputted in the html docs.

Committed as obvious after building html docs.

gcc/ChangeLog:

* doc/extend.texi (__builtin_alloca_with_align_and_max): Fix
defbuiltin usage.

Daily bump.

RISC-V: Support const vector expansion with step vector with base != 0

Currently, we are able to generate step vector with base == 0:
{ 0, 0, 2, 2, 4, 4, ... }

ASM:

vid
vand

However, we do wrong for step vector with base != 0:
{ 1, 1, 3, 3, 5, 5, ... }

Before this patch, such case will run fail.

After this patch, we are able to pass the testcase and generate the step vector with asm:

vid
vand
vadd

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix stepped vector
with base != 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-19.c: New test.

docs: Add @cindex for some attributes

While looking for the access attribute,
I tried to find it via the concept index but it was
missing. This patch fixes that and adds one for
interrupt/interrupt_handler too.

Committed as obvious after building the HTML docs
and looking at the resulting concept index page.

gcc/ChangeLog:

* doc/extend.texi (access attribute): Add
cindex for it.
(interrupt/interrupt_handler attribute):
Likewise.

libstdc++: Synchronize PSTL with upstream

This patch rebases the C++17 parallel algorithms implementation (pstl)
against the current upstream version, commit 843c12d6a.

This version does not currently include the recently added OpenMP
backend, that will be considered for a future version.

libstdc++-v3/ChangeLog:
* include/pstl/algorithm_fwd.h: Synchronize with upstream.
* include/pstl/algorithm_impl.h: Likewise.
* include/pstl/execution_defs.h: Likewise.
* include/pstl/execution_impl.h: Likewise.
* include/pstl/glue_algorithm_impl.h: Likewise.
* include/pstl/glue_execution_defs.h: Likewise.
* include/pstl/glue_memory_impl.h: Likewise.
* include/pstl/glue_numeric_impl.h: Likewise.
* include/pstl/memory_impl.h: Likewise.
* include/pstl/numeric_fwd.h: Likewise.
* include/pstl/numeric_impl.h: Likewise.
* include/pstl/parallel_backend.h: Likewise.
* include/pstl/parallel_backend_serial.h: Likewise.
* include/pstl/parallel_backend_tbb.h: Likewise.
* include/pstl/parallel_impl.h: Likewise.
* include/pstl/pstl_config.h: Likewise.
* include/pstl/unseq_backend_simd.h: Likewise.
* include/pstl/utils.h: Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_construct.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_copy_move.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_fill_destroy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/inplace_merge.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/merge.cc: Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/copy_if.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/copy_move.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/fill.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/generate.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/is_partitioned.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/partition.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/partition_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/remove.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/remove_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/replace.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/replace_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/swap_ranges.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_unary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/unique.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/unique_copy_equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/adjacent_find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/all_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/any_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/count.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_end.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_first_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_if.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/for_each.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/none_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/search_n.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_heap.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_sorted.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/sort.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/adjacent_difference.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/reduce.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/scan.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/transform_reduce.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/transform_scan.cc:
Likewise.
* testsuite/util/pstl/test_utils.h:
Likewise.

compiler: support -fgo-importcfg

* lang.opt (fgo-importcfg): New option.
* go-c.h (struct go_create_gogo_args): Add importcfg field.
* go-lang.cc (go_importcfg): New static variable.
(go_langhook_init): Set args.importcfg.
(go_langhook_handle_option): Handle -fgo-importcfg.
* gccgo.texi (Invoking gccgo): Document -fgo-importcfg.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/506095

aarch64: Use <DWI> instead of <V2XWIDE> in scalar SQRSHRUN pattern

In the scalar pattern for SQRSHRUN it's a bit clearer to use DWI instead of V2XWIDE
to make it more clear that no vector modes are involved.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqrshrun_n<mode>_insn):
Use <DWI> instead of <V2XWIDE>.
(aarch64_sqrshrun_n<mode>): Likewise.

aarch64: Clean up some rounding immediate predicates

aarch64_simd_rsra_rnd_imm_vec is now used for more than just RSRA
and accepts more than just vectors so rename it to make it more
truthful.
The aarch64_simd_rshrn_imm_vec is now unused and can be deleted.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec):
Rename to...
(aarch64_int_rnd_operand): ... This.
(aarch64_simd_rshrn_imm_vec): Delete.
* config/aarch64/aarch64-simd.md (aarch64_<sra_op>rsra_n<mode>_insn):
Adjust for the above.
(aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): Likewise.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise.
(*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_le): Likewise.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_be): Likewise.
(aarch64_sqrshrun2_n<mode>_insn_le): Likewise.
(aarch64_sqrshrun2_n<mode>_insn_be): Likewise.
* config/aarch64/aarch64.cc (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.

libstdc++: Fix std::format for pointers [PR110239]

The formatter for pointers was casting to uint64_t which sign extends a
32-bit pointer and produces a value that won't fit in the provided
buffer. Cast to uintptr_t instead.

There was also a bug in the __parse_integer helper when converting a
wide string to a narrow string in order to use std::from_chars on it.
The function would always try to read 32 characters, even if the format
string was shorter than that. Fix that bug, and remove the constexpr
implementation of __parse_integer by just using __from_chars_alnum
instead of from_chars, because that's usable in constexpr even in
C++20.

libstdc++-v3/ChangeLog:

PR libstdc++/110239
* include/std/format (__format::__parse_integer): Fix buffer
overflow for wide chars.
(formatter<const void*, C>::format): Cast to uintptr_t instead
of uint64_t.
* testsuite/std/format/string.cc: Test too-large widths.

libstdc++: Implement P2538R1 ADL-proof std::projected

This was recently approved for C++26, but there's no harm in
implementing it unconditionally for C++20 and C++23. As it says in the
paper, it doesn't change the meaning of any valid code. It only enables
things that were previously ill-formed for questionable reasons.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (projected): Replace class
template with alias template denoting an ADL-proofed helper.
(incremental_traits<projected<Iter, Proj>>): Remove.
* testsuite/24_iterators/indirect_callable/projected-adl.cc:
New test.

libstdc++: Qualify calls to debug mode helpers

These functions should be qualified to disable unwanted ADL.

The overload of __check_singular_aux for safe iterators was previously
being found by ADL, because it wasn't declared before __check_singular.
Add a declaration so that it can be found by qualified lookup.

libstdc++-v3/ChangeLog:

* include/debug/helper_functions.h (__get_distance)
(__check_singular, __valid_range_aux, __valid_range): Qualify
calls to disable ADL.
(__check_singular_aux(const _Safe_iterator_base*)): Declare
overload that was previously found via ADL.

IBM zSystems: Assume symbols without explicit alignment to be ok

A change we have committed back in 2015 relies on the backend
requested ABI alignment to be applied to ALL symbols by the
middle-end. However, this does not appear to be the case for external
symbols. With this commit we assume all symbols without explicit
alignment to be aligned according to the ABI. That's the behavior we
had before.
This fixes a performance regression caused by the 2015 patch. Since
then the address of external char type symbols have been pushed to the
literal pool, although it is safe to access them with larl (which
requires symbols to reside at even addresses).

gcc/
* config/s390/s390.cc (s390_encode_section_info): Set
SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been
misaligned.

gcc/testsuite/
* gcc.target/s390/larl-1.c: New test.

Fix profile of forwarders produced by cd-dce

compiling the testcase from PR109849 (which uses std:vector based stack to
drive a loop) with profile feedbakc leads to profile mismatches introduced by
tree-ssa-dce. This is the new code to produce unified forwarder blocks for
PHIs.

I am not including the testcase itself since
checking it for Invalid sum is probably going to be too fragile and this should
show in our LNT testers. The patch however fixes the mismatch.

Bootstrapped/regtested x86_64-linux and plan to commit it shortly.

gcc/ChangeLog:

PR tree-optimization/109849
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile
count of newly constructed forwarder block.

docs: Fix typo

gcc/ChangeLog:

* doc/optinfo.texi: Fix "steam" -> "stream".

DSE: Add LEN_MASK_STORE analysis into DSE and fix LEN_STORE

Hi, Richi.

This patch is adding LEN_MASK_STORE into DSE.

My understanding is LEN_MASK_STORE is predicated by mask and len.
No matter len is constant or not, the ao_ref should be the same as MASK_STORE.

Wheras for LEN_STORE, when len is constant, we use (len - bias), otherwise, it's
the same as MASK_STORE/LEN_MASK_STORE.

Not sure whether I am on the same page with you, feel free to correct me.

Thanks.

gcc/ChangeLog:

* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE and
fix LEN_STORE.
(dse_optimize_stmt): Add LEN_MASK_STORE.

GIMPLE_FOLD: Fix gimple fold for LEN_{MASK}_{LOAD,STORE}

Hi, previous I made a mistake on GIMPLE_FOLD of LEN_MASK_{LOAD,STORE}.

We should fold LEN_MASK_{LOAD,STORE} (bias+len) == vf (nunits instead of bytesize) && mask = all trues mask

into:
MEM_REF [...].

This patch added testcase to test gimple fold of LEN_MASK_{LOAD,STORE}.

Also, I fix LEN_LOAD/LEN_STORE, to make them have the same behavior.

Ok for trunk ?

gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Fix gimple
fold of LOAD/STORE with length.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: New test.

Avoid redundant GORI calcuations.

When GORI evaluates a statement, if operand 1 and 2 are both in the
dependency chain, GORI evaluates the name through both operands sequentially
and combines the results.

If either operand is in the dependency chain of the other, this
evaluation will do the same work twice, for questionable gain.
Instead, simple evaluate only the operand which depends on the other
and keep the evaluation linear in time.

* gimple-range-gori.cc (compute_operand1_and_operand2_range):
Check for interdependence between operands 1 and 2.

vect: Cost intermediate conversions

g:6f19cf7526168f8 extended N-vector to N-vector conversions
to handle cases where an intermediate integer extension or
truncation is needed. This patch adjusts the cost to account
for these intermediate conversions.

gcc/
* tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt
into account when costing non-widening/truncating conversions.

tree-optimization/110381 - preserve SLP permutation with in-order reductions

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.

* gcc.dg/vect/pr110381.c: New testcase.

RISC-V: Remove duplicated extern function_base decl

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.

narrowing initializers and initializer_constant_valid_p_1

initializer_constant_valid_p_1 attempts to handle narrowing
differences and sums but fails to handle when the overall
value looks like

  VIEW_CONVERT_EXPR<long long int>(NON_LVALUE_EXPR <v>
    -  VEC_COND_EXPR < { 0, 0 } == { 0, 0 } , { -1, -1 } , { 0, 0 } > )

where endtype is scalar integer but value is a vector type.
In this particular case all is good and we recurse since
two vector lanes is more than 64bits of long long.  But still
it compares apples and oranges.

Fixed by appropriately also requiring the type of the
value to be scalar integral.

* varasm.cc (initializer_constant_valid_p_1): Also
constrain the type of value to be scalar integral
before dispatching to narrowing_initializer_constant_valid_p.

Avoid shorten_binary_op on VECTOR_TYPE

When we disallow TYPE_PRECISION on VECTOR_TYPEs it shows that
shorten_binary_op performs some checks on that that are likely
harmless in the end. The following bails out early for
VECTOR_TYPE operations to avoid those questionable checks.

gcc/c-family/
* c-common.cc (shorten_binary_op): Exit early for VECTOR_TYPE
operations.

Fix TYPE_PRECISION use in hashable_expr_equal_p

While the checks look unnecessary they probably are quick and
thus done early. The following avoids using TYPE_PRECISION
on VECTOR_TYPEs by making the code match the comment which
talks about precision and signedness. An alternative would
be to only retain the ERROR_MARK and TYPE_MODE checks or
use TYPE_PRECISION_RAW (but I like that least).

* tree-ssa-scopedtables.cc (hashable_expr_equal_p):
Use element_precision.

RISC-V: Remove redundant vcond patterns

Previously, Richi has suggested that vcond patterns are only needed when target
support comparison + select consuming 1 instruction.

Now, I do the experiments on removing those "vcond" patterns, it works perfectly.

All testcases PASS.

Really appreicate Richi helps us recognize such issue.

Now remove all "vcond" patterns as Richi suggested.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond<V:mode><VI:mode>): Remove redundant
vcond patterns.
(vcondu<V:mode><VI:mode>): Ditto.
* config/riscv/riscv-protos.h (expand_vcond): Ditto.
* config/riscv/riscv-v.cc (expand_vcond): Ditto.

tree-optimization/110392 - ICE with predicate analysis

Feeding not optimized IL can result in predicate normalization
to simplify things so a predicate can get true or false. The
following re-orders the early exit in that case to come after
simplification and normalization to take care of that.

PR tree-optimization/110392
* gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded):
Do early exits on true/false predicate only after normalization.