git.ipfire.org Git - thirdparty/gcc.git/log

libiberty: writeargv: Simplify function error mode.

writeargv can be simplified by getting rid of the error exit mode
that was only relevant many years ago when the function used
to open the file descriptor internally.

0001-libiberty-writeargv-Simplify-function-error-mode.patch

From 1271552baee5561fa61652f4ca7673c9667e4f8f Mon Sep 17 00:00:00 2001
From: Costas Argyris <costas.argyris@gmail.com>
Date: Mon, 5 Jun 2023 15:02:06 +0100
Subject: [PATCH] libiberty: writeargv: Simplify function error mode.

The goto-based error mode was based on a previous version
of the function where it was responsible for opening the
file, so it had to close it upon any exit:

https://inbox.sourceware.org/gcc-patches/20070417200340.GM9017@sparrowhawk.codesourcery.com/

(thanks pinskia)

This is no longer the case though since now the function
takes the file descriptor as input, so the exit mode on
error can be just a simple return 1 statement.

libiberty/
* argv.c (writeargv): Simplify & remove gotos.

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>

bootstrap rtl-checking: Fix XVEC vs XVECEXP in postreload.cc

PR bootstrap/110120
* postreload.cc (reload_cse_move2add, move2add_use_add2_insn): Use
XVECEXP, not XEXP, to access first item of a PARALLEL.

RISC-V] add TC for save-restore cfi directives.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-cfi.c: New test to check save-restore
cfi directives.

RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:

vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum

Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.

[RISC-V] correct machine mode in save-restore cfi RTL.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): Use Pmode
for cfi reg/mem machmode
(riscv_adjust_libcall_cfi_epilogue): Use Pmode for cfi reg machmode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode
for cfi reg/mem.

RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

gcc/ChangeLog:

* config/riscv/vector-iterators.md:
Fix 'REQUIREMENT' for machine_mode 'MODE'.
* config/riscv/vector.md (@pred_indexed_<order>store<VNX16_QHS:mode>
<VNX16_QHSI:mode>): change VNX16_QHSI to VNX16_QHSDI.
(@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Ditto.

RISC-V: Fix some typo in vector-iterators.md

This patch would like to fix some typo in vector-iterators.md, aka:

[-"vnx1DI")-]{+"vnx1di")+}
[-"vnx2SI")-]{+"vnx2si")+}
[-"vnx1SI")-]{+"vnx1si")+}

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix typo in mode attr.

Daily bump.

Remove widen_plus/minus_expr tree codes

This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.

2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>

gcc/ChangeLog:

* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab,
vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab,
vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab,
vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs.
* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewise.

internal-fn,vect: Refactor widen_plus as internal_fn

     DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
respectively. With the exception that they provide convenience wrappers
for a single vector to vector conversion, a hi/lo split or an even/odd
split.  Each definition for <NAME> will require either signed optabs
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for
narrowing) for each of the five functions it creates.

      For example, for widening addition the
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions:
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two
optabs, one for signed and one for unsigned.
      Aarch64 implements the hi/lo split optabs:
      IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
      IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode> -> (u/s)addl

     This gives the same functionality as the previous
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

2023-06-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
    Joel Hutton  <joel.hutton@arm.com>
    Tamar Christina  <tamar.christina@arm.com>

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename
this ...
(vec_widen_<su>add_lo_<mode>): ... to this.
(vec_widen_<su>addl_hi_<mode>): Rename this ...
(vec_widen_<su>add_hi_<mode>): ... to this.
(vec_widen_<su>subl_lo_<mode>): Rename this ...
(vec_widen_<su>sub_lo_<mode>): ... to this.
(vec_widen_<su>subl_hi_<mode>): Rename this ...
(vec_widen_<su>sub_hi_<mode>): ...to this.
* doc/generic.texi: Document new IFN codes.
* internal-fn.cc (lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
internal_fn that expands into multiple internal_fns for widening.
(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD,
IFN_VEC_WIDEN_MINUS_EVEN): Define widening  plus,minus functions.
* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo or even/odd split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.

vect: Refactor to allow internal_fn's

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns

2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>

gcc/ChangeLog:
* tree-vect-patterns.cc: Add include for gimple-iterator.
(vect_recog_widen_op_pattern): Refactor to use code_helper.
(vect_gimple_build): New function.
* tree-vect-stmts.cc (simple_integer_narrowing): Refactor to use
code_helper.
(vectorizable_call): Likewise.
(vect_gen_widened_results_half): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vect_create_vectorized_promotion_stmts): Likewise.
(vect_create_half_widening_stmts): Likewise.
(vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vectorizer.h (supportable_widening_operation): Change
prototype to use code_helper.
(supportable_narrowing_operation): Likewise.
(vect_gimple_build): New function prototype.
* tree.h (code_helper::safe_as_tree_code): New function.
(code_helper::safe_as_fn_code): New function.

d: Warn when declared size of a special enum does not match its intrinsic type.

All special enums have declarations in the D runtime library, but the
compiler will recognize and treat them specially if declared in any
module. When the underlying base type of a special enum is a different
size to its matched intrinsic, then this can cause undefined behavior at
runtime. Detect and warn about when such a mismatch occurs.

gcc/d/ChangeLog:

* gdc.texi (Warnings): Document -Wextra and -Wmismatched-special-enum.
* implement-d.texi (Special Enums): Add reference to warning option
-Wmismatched-special-enum.
* lang.opt: Add -Wextra and -Wmismatched-special-enum.
* types.cc (TypeVisitor::visit (TypeEnum *)): Warn when declared
special enum size mismatches its intrinsic type.

gcc/testsuite/ChangeLog:

* gdc.dg/Wmismatched_enum.d: New test.

New wi::bitreverse function.

This patch provides a wide-int implementation of bitreverse, that
implements both of Richard Sandiford's suggestions from the review at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html of an
improved API (as a stand-alone function matching the bswap refactoring),
and an implementation that works with any bit-width precision.

2023-06-05 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): New function implementing
bit reversal of an integer.
* wide-int.h (wi::bitreverse): New (template) function prototype.
(bitreverse_large): Prototype helper function/implementation.
(wi::bitreverse): New template wrapper around bitreverse_large.

Testsuite: Fix a fail about xtheadcondmov-indirect-rv64.c

I find fail of the xtheadcondmov-indirect-rv64.c test case and provide a way to solve it.
In this patch, I take Kito's advice that I modify the form of the function bodies.It likes
*[a-x0-9].

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Generalize to be
less sensitive to register allocation choices.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Similarly.

print-rtl: Change return type of two print functions from int to void

Also change one internal variable to bool.

gcc/ChangeLog:

* rtl.h (print_rtl_single): Change return type from int to void.
(print_rtl_single_with_indent): Ditto.
* print-rtl.h (class rtx_writer): Ditto. Change m_sawclose to bool.
* print-rtl.cc (rtx_writer::rtx_writer): Update for m_sawclose change.
(rtx_writer::print_rtx_operand_code_0): Ditto.
(rtx_writer::print_rtx_operand_codes_E_and_V): Ditto.
(rtx_writer::print_rtx_operand_code_i): Ditto.
(rtx_writer::print_rtx_operand_code_u): Ditto.
(rtx_writer::print_rtx_operand): Ditto.
(rtx_writer::print_rtx): Ditto.
(rtx_writer::finish_directive): Ditto.
(print_rtl_single): Change return type from int to void
and adjust function body accordingly.
(rtx_writer::print_rtl_single_with_indent): Ditto.

reginfo: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* rtl.h (reg_classes_intersect_p): Change return type from int to bool.
(reg_class_subset_p): Ditto.
* reginfo.cc (reg_classes_intersect_p): Ditto.
(reg_class_subset_p): Ditto.

libiberty: pex-win32.c: Fix some typos.

libiberty/ChangeLog:

* pex-win32.c: fix typos.

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>

RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

libiberty: On Windows, pass a >32k cmdline through a response file.

pex-win32.c (win32_spawn): If the command line for CreateProcess
exceeds the 32k Windows limit, try to store it in a temporary
response file and call CreateProcess with @file instead (PR71850).

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>
libiberty/ChangeLog:

* pex-win32.c (win32_spawn): Check command line length
and generate a response file if necessary.
(spawn_script): Adjust parameters.
(pex_win32_exec_child): Ditto.

Signed-off-by: Jonathan Yong <10walls@gmail.com>

Fix PR 110085: `make clean` in GCC directory on sh target causes a failure

On sh target, there is a MULTILIB_DIRNAMES (or is it MULTILIB_OPTIONS) named m2,
this conflicts with the langauge m2. So when you do a `make clean`, it will remove
the m2 directory and then a build will fail. Now since r0-78222-gfa9585134f6f58,
the multilib directories are no longer created in the gcc directory as libgcc
was moved to the toplevel. So we can remove the part of clean that removes those
directories.

Tested on x86_64-linux-gnu and a cross to sh-elf that `make clean` followed by
`make` works again.

OK?

gcc/ChangeLog:

PR bootstrap/110085
* Makefile.in (clean): Remove the removing of
MULTILIB_DIR/MULTILIB_OPTIONS directories.

libgcc: Use initarray section type for .init_stack

One of my workmates found there is a warning like:

  libgcc/config/rs6000/morestack.S:402: Warning: ignoring
    incorrect section type for .init_array.00000

when compiling libgcc/config/rs6000/morestack.S.

Since commit r13-6545 touched that file recently, which was
suspected to be responsible for this warning, I did some
investigation and found this is a warning staying for a long
time.  For section .init_stack*, it's preferred to use
section type SHT_INIT_ARRAY.  So this patch is use
"@init_array" to replace "@progbits".

Although the warning is trivial, Segher suggested me to
post this to fix it, in order to avoid any possible
misunderstanding/confusion on the warning.

As Alan confirmed, this doesn't require a premise check
on if the existing binutils supports "@init_array" or not,
"because if you want split-stack to work, you must link
with gold, any version of binutils that has gold has an
assembler that understands @init_array". (Thanks Alan!)

libgcc/ChangeLog:

* config/i386/morestack.S: Use @init_array rather than
@progbits for section type of section .init_array.
* config/rs6000/morestack.S: Likewise.
* config/s390/morestack.S: Likewise.

MIPS: Add speculation_barrier support

speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.

gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/mips/mips.md (speculation_barrier): Call
mips_emit_speculation_barrier.

libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
included in lib1funcs.S now.

RISC-V: Reorganize riscv-v.cc

This patch is just reorganizing the functions for the following patch.

I put rvv_builder and emit_* functions located before expand_const_vector
function since I will use them in expand_const_vector in the following patch.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Reorganize functions.
(rvv_builder::can_duplicate_repeating_sequence_p): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(rvv_builder::get_merge_scalar_mask): Ditto.
(emit_scalar_move_insn): Ditto.
(emit_vlmax_integer_move_insn): Ditto.
(emit_nonvlmax_integer_move_insn): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(get_repeating_sequence_dup_machine_mode): Ditto.

RISC-V: Split arguments of expand_vec_perm

Since the following patch will calls expand_vec_perm with
splitted arguments, change the expand_vec_perm interface in
this patch.

gcc/ChangeLog:

* config/riscv/autovec.md: Split arguments.
* config/riscv/riscv-protos.h (expand_vec_perm): Ditto.
* config/riscv/riscv-v.cc (expand_vec_perm): Ditto.

Daily bump.

Improve do_store_flag for comparing single bit against that bit

This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  t ^= (1<<30);
  return t != 0;
}
```

We should handle the case where the nonzero bits is the same as the
comparison operand.

Changes from v1:
* v2: Updated for the bit extraction changes.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* expr.cc (do_store_flag): Improve for single bit testing
not against zero but against that single bit.

Improve do_store_flag for single bit comparison against 0

While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
if (t & ~(1<<30)) __builtin_unreachable();
return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available).
This patch extends it to handle the case where we know t has a nonzero
of just one bit set.

Changes from v1:
* v2: Updated for the bit extraction improvements.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* expr.cc (do_store_flag): Extend the one bit checking case
to handle the case where we don't have an and but rather still
one bit is known to be non-zero.

Convert H8 port to LRA

With Vlad's recent LRA fix to the elimination code, the H8 can be converted
to LRA.

This patch has two changes of note.

First, this turns Zz into a standard constraint.  This helps reloading for
the H8/SX movqi pattern.

Second, this drops the whole pattern for the SX bit memory operations.  I
can't see why those exist to begin with.  They should be handled by the
standard bit manipulation patterns.   If someone wants to try and improve SX
bit support, that'd be great and they can do so within the LRA framework :-)

Pushed to the trunk...

gcc/
* config/h8300/constraints.md (Zz): Make this a normal
constraint.
* config/h8300/h8300.cc (TARGET_LRA_P): Remove.
* config/h8300/logical.md (H8/SX bit patterns): Remove.

xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MIN

This patch optimizes both the boolean evaluation of and the branching of
EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi-
cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN,
otherwise non-negative value.

    /* example */
    int test0(int x) {
      return (x == -2147483648);
    }
    int test1(int x) {
      return (x != -2147483648);
    }
    extern void foo(void);
    void test2(int x) {
      if(x == -2147483648)
        foo();
    }
    void test3(int x) {
      if(x != -2147483648)
        foo();
    }

    ;; before
    test0:
movi.n a9, -1
slli a9, a9, 31
add.n a2, a2, a9
nsau a2, a2
srli a2, a2, 5
ret.n
    test1:
movi.n a9, -1
slli a9, a9, 31
add.n a9, a2, a9
movi.n a2, 1
moveqz a2, a9, a9
ret.n
    test2:
movi.n a9, -1
slli a9, a9, 31
bne a2, a9, .L3
j.l     foo, a9
    .L3:
ret.n
    test3:
movi.n a9, -1
slli a9, a9, 31
beq a2, a9, .L5
j.l foo, a9
    .L5:
ret.n

    ;; after
    test0:
abs a2, a2
extui a2, a2, 31, 1
ret.n
    test1:
abs a2, a2
srai a2, a2, 31
addi.n a2, a2, 1
ret.n
    test2:
abs a2, a2
bbci a2, 31, .L3
j.l foo, a9
    .L3:
ret.n
    test3:
abs a2, a2
bbsi a2, 31, .L5
j.l foo, a9
    .L5:
ret.n

gcc/ChangeLog:

* config/xtensa/xtensa.md (*btrue_INT_MIN, *eqne_INT_MIN):
New insn_and_split patterns.

RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109

This patch is to fix PR110109 issue. This issue happens is because:

(define_insn_and_split "*vlmul_extx2<mode>"
  [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
       (subreg:<VLMULX2>
         (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
  "TARGET_VECTOR"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
  DONE;
})

Such pattern generate such codes in insn-recog.cc:
static int
pattern57 (rtx x1)
{
  rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0];
  rtx x2;
  int res ATTRIBUTE_UNUSED;
  if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0))
    return -1;
...

PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable
RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant ()

I create that patterns is to optimize the following test:
vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
  return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
}

codegen:
test_vlmul_ext_v_f32mf2_f32m2:
        vsetvli a5,zero,e32,m2,ta,ma
        vmv.v.i v2,0
        vsetvli a5,zero,e32,mf2,ta,ma
        vle32.v v2,0(a1)
        vs2r.v  v2,0(a0)
        ret

There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR
(unlike LLVM, LLVM has undef/poison).  For vlmul_ext_* RVV intrinsic,
GCC will initiate all zeros into register. However, I think it's not
a big issue after we support subreg livness tracking.

PR target/110109

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
* config/riscv/vector.md (@vlmul_extx2<mode>): Remove it.
(@vlmul_extx4<mode>): Ditto.
(@vlmul_extx8<mode>): Ditto.
(@vlmul_extx16<mode>): Ditto.
(@vlmul_extx32<mode>): Ditto.
(@vlmul_extx64<mode>): Ditto.
(*vlmul_extx2<mode>): Ditto.
(*vlmul_extx4<mode>): Ditto.
(*vlmul_extx8<mode>): Ditto.
(*vlmul_extx16<mode>): Ditto.
(*vlmul_extx32<mode>): Ditto.
(*vlmul_extx64<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110109-1.c: New test.
* gcc.target/riscv/rvv/base/pr110109-2.c: New test.

RISC-V: Support RVV FP16 ZVFHMIN intrinsic API

This patch support the 2 intrinsic API of FP16 ZVFHMIN extension. Aka
SEW=16 for below instructions

vfwcvt.f.f.v
vfncvt.f.f.w

Then users can leverage the instrinsic APIs to perform the conversion
between RVV vector single float point and half float point.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): Add vfloat32mf2_t type to vfncvt.f.f.w operations.
(vfloat32m1_t): Likewise.
(vfloat32m2_t): Likewise.
(vfloat32m4_t): Likewise.
(vfloat32m8_t): Likewise.
* config/riscv/riscv-vector-builtins.def: Fix typo in comments.
* config/riscv/vector-iterators.md: Add single to half machine
mode conversion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: New test.

RISC-V: Move optimization patterns into autovec-opt.md

Move all optimization patterns into autovec-opt.md to make organization
easier maintain.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*<optab>not<mode>): Move to autovec-opt.md.
(*n<optab><mode>): Ditto.
* config/riscv/autovec.md (*<optab>not<mode>): Ditto.
(*n<optab><mode>): Ditto.
* config/riscv/vector.md: Ditto.

PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.

This patch fixes PR target/110083, an ICE-on-valid regression exposed by
my recent PTEST improvements (to address PR target/109973).  The latent
bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update
or delete REG_EQUAL notes attached to COMPARE instructions.  As a result
the operands of COMPARE would be mismatched, with the register transformed
to V1TImode, but the immediate operand left as const_wide_int, which is
valid for TImode but not V1TImode.  This remained latent when the STV
conversion converted the mode of the COMPARE to CCmode, with later passes
recognizing the REG_EQUAL note is obviously invalid as the modes didn't
match, but now that we (correctly) preserve the CCZmode on COMPARE, the
mismatched operand modes trigger a sanity checking ICE downstream.

Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare.

Before:
    (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
        (const_wide_int 0x80000000000000000000000000000000))

After:
    (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
        (const_vector:V1TI [
            (const_wide_int 0x80000000000000000000000000000000)
         ]))

2023-06-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/110083
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Update or delete REG_EQUAL notes, converting CONST_INT and
CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.

gcc/testsuite/ChangeLog
PR target/110083
* gcc.target/i386/pr110083.c: New test case.

c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

[except.handle]/7 says that when we enter std::terminate due to a throw,
that is considered an active handler.  We already implemented that properly
for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch
before std::terminate) and the case of finding a callsite with no landing
pad (the personality function calls __cxa_call_terminate which calls
__cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept
function, we were emitting a cleanup that calls std::terminate directly
without ever calling __cxa_begin_catch to handle the exception.

A straightforward way to fix this seems to be calling __cxa_call_terminate
instead.  However, that requires exporting it from libstdc++, which we have
not previously done.  Despite the name, it isn't actually part of the ABI
standard.  Nor is __cxa_call_unexpected, as far as I can tell, but that one
is also used by clang.  For this case they use __clang_call_terminate; it
seems reasonable to me for us to stick with __cxa_call_terminate.

I also change __cxa_call_terminate to take void* for simplicity in the front
end (and consistency with __cxa_call_unexpected) but that isn't necessary if
it's undesirable for some reason.

This patch does not fix the issue that representing the noexcept as a
cleanup is wrong, and confuses the handler search; since it looks like a
cleanup in the EH tables, the unwinder keeps looking until it finds the
catch in main(), which it should never have gotten to.  Without the
try/catch in main, the unwinder would reach the end of the stack and say no
handler was found.  The noexcept is a handler, and should be treated as one,
as it is when the landing pad is omitted.

The best fix for that issue seems to me to be to represent an
ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an
ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification).  The
actual code generation shouldn't need to change (apart from the change made
by this patch), only the action table entry.

PR c++/97720

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN.
(call_terminate_fn): New macro.
* cp-gimplify.cc (gimplify_must_not_throw_expr): Use it.
* except.cc (init_exception_processing): Set it.
(cp_protect_cleanup_actions): Return it.

gcc/ChangeLog:

* tree-eh.cc (lower_resx): Pass the exception pointer to the
failure_decl.
* except.h: Tweak comment.

libstdc++-v3/ChangeLog:

* libsupc++/eh_call.cc (__cxa_call_terminate): Take void*.
* config/abi/pre/gnu.ver: Add it.

gcc/testsuite/ChangeLog:

* g++.dg/eh/terminate2.C: New test.

reload_cse_move2add: Handle trivial single_set:s

The reload_cse_move2add part of "postreload" handled only
insns whose PATTERN was a SET.  That excludes insns that
e.g. clobber a flags register, which it does only for
"simplicity".  The patch extends the "simplicity" to most
single_set insns.  For a subset of those insns there's still
an assumption; that the single_set of a PARALLEL insn is the
first element in the PARALLEL.  If the assumption fails,
it's no biggie; the optimization just isn't performed.
Don't let the name deceive you, this optimization doesn't
hit often, but as often (or as rarely) for LRA as for reload
at least on e.g. cris-elf where the biggest effect was seen
in reducing repeated addresses in copies from fixed-address
arrays, like in gcc.c-torture/compile/pr78694.c.

* postreload.cc (move2add_use_add2_insn): Handle
trivial single_sets.  Rename variable PAT to SET.
(move2add_use_add3_insn, reload_cse_move2add): Similar.

RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill

This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.

Daily bump.

[RISC-V] fix cfi issue in save-restore.

This patch fixes a cfi issue introduced by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20

Test code:
char my_getchar();
float getf();
int test_f0()
{
  int s0 = my_getchar();
  float f0 = getf();
  int b = my_getchar();
  return f0+s0+b;
}

cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow

before patch:
test_f0:
...
.cfi_startproc
call t0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addi sp,sp,-16
.cfi_def_cfa_offset 32

...

addi sp,sp,16
.cfi_def_cfa_offset 0  // issue here
...
tail __riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset -16 // issue here
.cfi_endproc

after patch:
test_f0:
...
.cfi_startproc
call t0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addi sp,sp,-16
.cfi_def_cfa_offset 32

...

addi sp,sp,16
.cfi_def_cfa_offset 16  // corrected here
...
tail __riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset 0 // corrected here
.cfi_endproc

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with
correct offset.

Remove unnecessary md pattern for TARGET_XTHEADCONDMOV

There are 2 small changes in this patch, but they do not affect the result.

1. Remove unnecessary md pattern for TARGET_XTHEADCONDMOV in thead.md. The operands[4]
in "if_then_else" are always comparison operations, so the generated rtl does not match
the pattern that is expected to be deleted.

2. Change operands[4] from const0_rtx to operands[1] to maintain rtl consistency. Although
when output assembly, only operands[4] CODE will affect the output result.

Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/thead.md (*th_cond_gpr_mov<GPR:mode><GPR2:mode>): Delete.

Add more ForEachMacros to clang-format file

contrib/
* clang-format (ForEachMacros): Add missing cases
for EXECUTE_IF_... macros.

PR modula2/110003 Wrong source line listed for unused parameters

Ensure that the parameter token position is recorded for both
definition and implementation modules. The shadow variable
is created inside BuildFormalParameterSection. The shadow
variable needs to have the other definition or implementation module
token position set when CheckFormalParameterSection is called.
This allows the MetaError family of procedures to request the
implementation module token position when reporting unused parameters.

gcc/m2/ChangeLog:

PR modula2/110003
* gm2-compiler/P2SymBuild.mod (GetParameterShadowVar): Import.
(CheckFormalParameterSection): Call PutDeclared for the shadow
variable associated with the parameter.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: is_specialization_of_friend confusion [PR109923]

The check for a non-template member function of a class template in
is_specialization_of_friend is overbroad, and accidentally holds for a
non-template hidden friend too, which for the testcase below causes the
predicate to bogusly return true for

decl = void non_templ_friend(A<int>, A<void>)
friend_decl = void non_templ_friend(A<void>, A<void>)

This patch refines the check appropriately.

PR c++/109923

gcc/cp/ChangeLog:

* pt.cc (is_specialization_of_friend): Fix overbroad check for
a non-template member function of a class template.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend79.C: New test.

c++: simplify TEMPLATE_TEMPLATE_PARM hashing

r10-7815-gaa576f2a860c82 added special hashing for TEMPLATE_TEMPLATE_PARM
to work around non-lowered ttps having TYPE_CANONICAL set but lowered
ttps did not. But ever since r13-737-gd0ef9e06197d14 this is no longer
the case, and all ttps should now have TYPE_CANONICAL set. So this
special hashing is now unnecessary and we can fall back to always using
TYPE_CANONICAL.

gcc/cp/ChangeLog:

* pt.cc (iterative_hash_template_arg): Don't hash
TEMPLATE_TEMPLATE_PARM specially.

c++: replace in_template_function

All uses of in_template_function except for the one in cp_make_fname_decl
seem like they could be generalized to consider any template context.
To that end this patch replaces the predicate with a generalized
in_template_context predicate that returns true if we're inside any
template context. If we legitimately need to consider only function
contexts, as in cp_make_fname_decl, we can just additionally check e.g.
current_function_decl.

One concrete benefit of this, which the adjusted testcase below
demonstrates, is that we no longer instantiate/odr-use entities based on
uses within a non-function template.

gcc/cp/ChangeLog:

* class.cc (build_base_path): Check in_template_context instead
of in_template_function.
(resolves_to_fixed_type_p): Likewise.
* cp-tree.h (in_template_context): Define.
(in_template_function): Remove.
* decl.cc (cp_make_fname_decl): Check current_function_decl
and in_template_context instead of in_template_function.
* decl2.cc (mark_used): Check in_template_context instead of
in_template_function.
* pt.cc (in_template_function): Remove.
* semantics.cc (enforce_access): Check in_template_context
instead of current_template_parms directly.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Waddress-of-packed-member2.C: No longer expect a()
to be marked as odr-used.

c++: mangle noexcept-expr [PR70790]

This implements noexcept(expr) mangling and demangling as per the
Itanium ABI.

PR c++/70790

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Handle NOEXCEPT_EXPR.

libiberty/ChangeLog:

* cp-demangle.c (cplus_demangle_operators): Add the noexcept
operator.
(d_print_comp_inner) <case DEMANGLE_COMPONENT_UNARY>: Always
print parens around the operand of noexcept too.
* testsuite/demangle-expected: Test noexcept operator
demangling.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle78.C: New test.

fix radix sort on 32bit platforms [PR109670]

The radix sort uses two buffers, a1 for input and a2 for output.
After every digit the role of the two buffers is swapped.
When terminating the sort early the code made sure the output
was in a2. However, when we run out of bits, as can happen on
32bit platforms, the sorted result was in a1, as we had just
swapped a1 and a2.
This patch fixes the problem by unconditionally having a1 as
output after every loop iteration.

This bug manifested itself only on 32bit platforms and even then
only in some circumstances, as it needs frames where a swap
is required due to differences in the top-most byte, which is
affected by ASLR. The new logic was validated by exhaustive
search over 32bit input values.

libgcc/ChangeLog:
PR libgcc/109670
* unwind-dw2-fde.c: Fix radix sort buffer management.

release the sorted FDE array when deregistering a frame [PR109685]

The atomic fastpath bypasses the code that releases the sort
array which was lazily allocated during unwinding. We now
check after deregistering if there is an array to free.

libgcc/ChangeLog:
PR libgcc/109685
* unwind-dw2-fde.c: Free sort array in atomic fast path.

RISC-V: Fix warning in predicated.md

Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function â€˜bool arith_operand_or_mode_mask(rtx, machine_mode)â€™:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))

gcc/ChangeLog:

* config/riscv/predicates.md: Change INTVAL into UINTVAL.

RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
      int16_t *__restrict dst3, int16_t *__restrict dst4,
      int8_t *__restrict a, int8_t *__restrict b,
      int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

In such complicate case, the operand is not single used, used by multiple statements.
GCC combine optimization will iterate the combination of the operands.

Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
Currently, we have format:

(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend))

To handle this following cases (sign and unsigned widening multiplication mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
      int16_t *__restrict dst3, int16_t *__restrict dst4,
      int8_t *__restrict a, uint8_t *__restrict b,
      uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

Before this patch:

...
        vsext.vf2       v6,v1
        add     t0,a0,t4
        vzext.vf2       v4,v1
        vmul.vv v2,v4,v6
        add     t0,a1,t4
        vzext.vf2       v2,v1
        vmul.vv v4,v2,v4
        add     t0,a2,t4
        vmul.vv v2,v2,v6
        add     t0,a3,t4
        sub     t6,t6,t1
        vsext.vf2       v2,v1
        vmul.vv v2,v2,v6
...

After this patch:
...
        add     t0,a0,t3
        vwmulsu.vv      v2,v1,v3
        add     t0,a1,t3
        vwmulu.vv       v4,v3,v2
        add     t0,a2,t3
        vwmulsu.vv      v3,v1,v2
        add     t0,a3,t3
        sub     t4,t4,t1
        vwmul.vv        v2,v1,v3
...

gcc/ChangeLog:

* config/riscv/vector.md: Add vector-opt.md.
* config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

Daily bump.

Don't try bswap + rotate when TYPE_PRECISION(n->type) > n->range.

For the testcase in the PR, we have

  br64 = br;
  br64 = ((br64 << 16) & 0x000000ff00000000ull) | (br64 & 0x0000ff00ull);

  n->n: 0x3000000200.
  n->range: 32.
  n->type: uint64.

The original code assumes n->range is same as TYPE PRECISION(n->type),
and tries to rotate the mask from 0x300000200 -> 0x20300 which is
incorrect. The patch fixed this bug by not trying bswap + rotate when
TYPE_PRECISION(n->type) is not equal to n->range.

gcc/ChangeLog:

PR tree-optimization/110067
* gimple-ssa-store-merging.cc (find_bswap_or_nop): Don't try
bswap + rotate when TYPE_PRECISION(n->type) > n->range.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110067.c: New test.

i386: Add missing vector truncate patterns [PR92658].

Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector
truncate.

gcc/ChangeLog:

PR target/92658
* config/i386/mmx.md (truncv2hiv2qi2): New define_insn.
(truncv2si<mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr92658-avx512bw-trunc-2.c: New test.

rtl-optimization: [PR102733] DSE removing address which only differ by address space.

The problem here is DSE was not taking into account the address space
which meant if you had two addresses say `fs:0` and `gs:0` (on x86_64),
DSE would think they were the same and remove the first store.
This fixes that issue by adding a check for the address space too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR rtl-optimization/102733

gcc/ChangeLog:

* dse.cc (store_info): Add addrspace field.
(record_store): Record the address space
and check to make sure they are the same.

gcc/testsuite/ChangeLog:

* gcc.target/i386/addr-space-6.c: New test.

Fix PR 110042: ifcvt regression due to paradoxical subregs

After r14-1014-gc5df248509b489364c573e8, GCC started to emit
directly a zero_extract for `(t1&0x8)!=0`. This introduced
a small regression where ifcvt would not do the ifconversion
as there is now a paradoxical subreg in the dest which
was being rejected. Since paradoxical subreg set the whole
register, we can treat it as the same as a reg in the two places.

OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

gcc/ChangeLog:

PR rtl-optimization/110042
* ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs.
(bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST.

gcc/testsuite/ChangeLog:

PR rtl-optimization/110042
* gcc.target/aarch64/csel_bfx_2.c: New test.

Darwin, PPC: Fix struct layout with pragma pack [PR110044].

This bug was essentially that darwin_rs6000_special_round_type_align()
was ignoring externally-imposed capping of field alignment.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110044

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.

Fortran: fix diagnostics for SELECT RANK [PR100607]

gcc/fortran/ChangeLog:

PR fortran/100607
* resolve.cc (resolve_select_rank): Remove duplicate error.
(resolve_fl_var_and_proc): Prevent NULL pointer dereference and
suppress error message for temporary.

gcc/testsuite/ChangeLog:

PR fortran/100607
* gfortran.dg/select_rank_6.f90: New test.

btf: fix bootstrap -Wformat errors [PR110073]

Commit 7aae58b04b9 "btf: improve -dA comments for testsuite" broke
bootstrap on a number of architectures because it introduced some
new -Wformat errors.

Fix those errors by properly using PRIu64 and a small refactor to
the offending code.

Based on the suggested patch from Rainer Orth.

PR debug/110073

gcc/ChangeLog:

* btfout.cc (btf_absolute_func_id): New function.
(btf_asm_func_type): Call it here. Change index parameter from
size_t to ctf_id_t. Use PRIu64 formatter.

btf: Fix -Wformat errors

g:7aae58b04b92303ccda3ead600be98f0d4b7f462 introduced -Wformat errors
breaking bootstrap on some targets. This patch fixes that.

Committed as obvious.

gcc/ChangeLog:

* btfout.cc (btf_asm_type): Use PRIu64 instead of %lu for uint64_t.
(btf_asm_datasec_type): Likewise.

c++: fix explicit/copy problem [PR109247]

In the testcase, the user wants the assignment to use the operator= declared
in the class, but because [over.match.list] says that explicit constructors
are also considered for list-initialization, as affirmed in CWG1228, we end
up choosing the implicitly-declared copy assignment operator, using the
explicit constructor template for the argument, which is ill-formed. Other
implementations haven't implemented CWG1228, so we keep getting bug reports.

Discussion in CWG led to the idea for this targeted relaxation: if we use an
explicit constructor for the conversion to the argument of a copy or move
special member function, that makes the candidate worse than another.

DR 2735
PR c++/109247

gcc/cp/ChangeLog:

* call.cc (sfk_copy_or_move): New.
(joust): Add tiebreaker for explicit conv and copy ctor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-explicit3.C: New test.

rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrhx

The third argument for __builtin_altivec_tr_stxvrhx should be short *
not int *. Similarly, the third argument for __builtin_altivec_tr_stxvrwx
should be int * not short *. This patch fixes the arguments in the two
builtins.

A runnable test case is added to test the __builtin_altivec_tr_stxvrbx,
__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and
__builtin_altivec_tr_stxvrdx builtins.

gcc/
* config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of third argument.

gcc/testsuite/
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test
for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.

c++: make initializer_list array static again [PR110070]

After the maybe_init_list_as_* patches, I noticed that we were putting the
array of strings into .rodata, but then memcpying it into an automatic
array, which is pointless; we should be able to use it directly.

This doesn't happen automatically because TREE_ADDRESSABLE is set (since
r12-657 for PR100464), and so gimplify_init_constructor won't promote the
variable to static.  Theoretically we could do escape analysis to recognize
that the address, though taken, never leaves the function; that would allow
promotion when we're only using the address for indexing within the
function, as in initlist-opt2.C.  But this would be a new pass.

And in initlist-opt1.C, we're passing the array address to another function,
so it definitely escapes; it's only safe in this case because it's calling a
standard library function that we know only uses it for indexing.  So, a
flag seems needed.  I first thought to put the flag on the TARGET_EXPR, but
the VAR_DECL seems more appropriate.

In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE,
but I think DECL_MERGEABLE is a better name, especially if we're going to
apply it to the backing array of initializer_list, which is observable.  I
then also check it in places that check for -fmerge-all-constants, so that
multiple equivalent initializer-lists can also be combined.  And then it
seemed to make sense for [[no_unique_address]] to have this meaning for
user-written variables.

I think the note in [dcl.init.list]/6 intended to allow this kind of merging
for initializer_lists, but it didn't actually work; for an explicit array
with the same initializer, if the address escapes the program could tell
whether the same variable in two frames have the same address.  P2752 is
trying to correct this defect, so I'm going to assume that this is the
intent.

PR c++/110070
PR c++/105838

gcc/ChangeLog:

* tree.h (DECL_MERGEABLE): New.
* tree-core.h (struct tree_decl_common): Mention it.
* gimplify.cc (gimplify_init_constructor): Check it.
* cgraph.cc (symtab_node::address_can_be_compared_p): Likewise.
* varasm.cc (categorize_decl_for_section): Likewise.

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE.
(convert_like_internal) [ck_list]: Set it.
(set_up_extended_ref_temp): Copy it.
* tree.cc (handle_no_unique_addr_attribute): Set it.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/initlist-opt1.C: Check for static array.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.
* g++.dg/tree-ssa/initlist-opt4.C: New test.
* g++.dg/opt/icf1.C: New test.
* g++.dg/opt/icf2.C: New test.
* g++.dg/opt/icf3.C: New test.
* g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.

reg-stack: Change return type of predicate functions from int to bool

Also change some internal variables to bool and recode handling of
boolean varialbes to not use bitwise or.

gcc/ChangeLog:

* rtl.h (stack_regs_mentioned): Change return type from int to bool.
* reg-stack.cc (struct_block_info_def): Change "done" to bool.
(stack_regs_mentioned_p): Change return type from int to bool
and adjust function body accordingly.
(stack_regs_mentioned): Ditto.
(check_asm_stack_operands): Ditto.  Change "malformed_asm"
variable to bool.
(move_for_stack_reg): Recode handling of control_flow_insn_deleted.
(swap_rtx_condition_1): Change return type from int to bool
and adjust function body accordingly.  Change "r" variable to bool.
(swap_rtx_condition): Change return type from int to bool
and adjust function body accordingly.
(subst_stack_regs_pat): Recode handling of control_flow_insn_deleted.
(subst_stack_regs): Ditto.
(convert_regs_entry): Change return type from int to bool and adjust
function body accordingly.  Change "inserted" variable to bool.
(convert_regs_1): Recode handling of control_flow_insn_deleted.
(convert_regs_2): Recode handling of cfg_altered.
(convert_regs): Ditto.  Change "inserted" variable to bool.

varasm: check float size

In PR95226, the testcase was failing because we tried to output_constant a
NOP_EXPR to float from a double REAL_CST, and so we output a double where
the caller wanted a float. That doesn't happen anymore, but with the
output_constant hunk we will ICE in that situation rather than emit the
wrong number of bytes.

Part of the problem was that initializer_constant_valid_p_1 returned true
for that NOP_EXPR, because it compared the sizes of integer types but not
floating-point types. So the C++ front end assumed it didn't need to fold
the initializer.

PR c++/95226

gcc/ChangeLog:

* varasm.cc (output_constant) [REAL_TYPE]: Check that sizes match.
(initializer_constant_valid_p_1): Compare float precision.

analyzer: implement various atomic builtins [PR109015]

This patch implements many of the __atomic_* builtins from
sync-builtins.def as known_function subclasses within the analyzer.

gcc/analyzer/ChangeLog:
PR analyzer/109015
* kf.cc (class kf_atomic_exchange): New.
(class kf_atomic_exchange_n): New.
(class kf_atomic_fetch_op): New.
(class kf_atomic_op_fetch): New.
(class kf_atomic_load): New.
(class kf_atomic_load_n): New.
(class kf_atomic_store_n): New.
(register_atomic_builtins): New function.
(register_known_functions): Call register_atomic_builtins.

gcc/testsuite/ChangeLog:
PR analyzer/109015
* gcc.dg/analyzer/atomic-builtins-1.c: New test.
* gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c: New test.
* gcc.dg/analyzer/atomic-builtins-qemu-sockets.c: New test.
* gcc.dg/analyzer/atomic-types-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: regions in different memory spaces can't alias

gcc/analyzer/ChangeLog:
* store.cc (store::eval_alias_1): Regions in different memory
spaces can't alias.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: Require LTO for pr107557-[12].c

pr107557-[12].c invoke -flto option but do not check that the target
support LTO. This patch adds dg-require lto to the testcases.

* gcc.dg/pr107557-1.c: Require LTO support.
* gcc.dg/pr107557-2.c: Require LTO support.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

doc: clarify semantics of vector bitwise shifts

Explicitly say that attempted shift past element bit width is UB for
vector types. Mention that integer promotions do not happen.

gcc/ChangeLog:

* doc/extend.texi (Vector Extensions): Clarify bitwise shift
semantics.

VECT: Change flow of decrement IV

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.

This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin  <linkw@linux.ibm.com>

  PR tree-optimization/109971

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

target/110088: Improve operation of l-reg with const after move from d-reg.

After reload, there may be sequences like
   lreg = dreg
   lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
   dreg = dreg <op> const
   lreg = dreg
instead which is more efficient.

gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.

libstdc++: Fix broken _GLIBCXX_PARALLEL mode

Add missing <parallel/search.h> include in <parallel/algobase.h>.

libstdc++-v3/ChangeLog:

* include/parallel/algobase.h: Include <parallel/search.h>.

Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]

Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba
"Support parallel testing in libgomp, part II [PR66005]"
("..., and enable if 'flock' is available for serializing execution testing"),
where we saw:

> On my Dell Precision 7530 laptop:
>
>     $ uname -srvi
>     Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
>     $ grep '^model name' < /proc/cpuinfo | uniq -c
>          12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
>     $ nvidia-smi -L
>     GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
>
> ... [...]: case (c) standard configuration, no offloading
> configured, [...]

>     $ \time make check-target-libgomp
>
> Case (c), baseline; [...]:
>
>     1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
>     1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k
>
> Case (c), parallelized [using 'flock']:
>
> [...]
>     -j12 GCC_TEST_PARALLEL_SLOTS=12
>     2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
>     2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k

Quite the same when instead of 'flock' using this fallback Perl 'flock':

    2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k
    2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k

PR testsuite/66005
gcc/
* doc/install.texi: Document (optional) Perl usage for parallel
testing of libgomp.
libgomp/
* testsuite/lib/libgomp.exp: 'flock' through stdout.
* testsuite/flock: New.
* configure.ac (FLOCK): Point to that if no 'flock' available, but
'perl' is.
* configure: Regenerate.

Remove stale Autoconf checks for Perl

Subversion r110220 (Git commit 03b8fe495d716c004f5491eb2347537f115ab2d8) for
PR25884 "libgomp should not require perl to compile" removed all '$(PERL)'
usage from libgomp -- but didn't remove the then-unused Autoconf Perl check
itself. Later, this Autoconf Perl check appears to have been copied from
libgomp into other GCC libraries, likewise unused.

libgomp/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
libatomic/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
libgm2/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* libm2cor/Makefile.in: Likewise.
* libm2iso/Makefile.in: Likewise.
* libm2log/Makefile.in: Likewise.
* libm2min/Makefile.in: Likewise.
* libm2pim/Makefile.in: Likewise.
libitm/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.

Back to requiring "Perl version 5.6.1 (or later)" [PR82856]

With Subversion r265695 (Git commit 22e052725189a472e4e86ebb6595278a49f4bcdd)
"Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856)" we're back
to normal; per Automake 1.15.1 'configure.ac' still "[...] perl 5.6 or better
is required [...]".

PR bootstrap/82856
gcc/
* doc/install.texi (Perl): Back to requiring "Perl version 5.6.1 (or
later)".

Fortran: Fix some problems blocking associate meta-bug [PR87477]

2023-06-02 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/87477
* parse.cc (parse_associate): Replace the existing evaluation
of the target rank with calls to gfc_resolve_ref and
gfc_expression_rank. Identify untyped target function results
with structure constructors by finding the appropriate derived
type.
* resolve.cc (resolve_symbol): Allow associate variables to be
assumed shape.

gcc/testsuite/
PR fortran/87477
* gfortran.dg/associate_54.f90 : Cope with extra error.

PR fortran/102109
* gfortran.dg/pr102109.f90 : New test.

PR fortran/102112
* gfortran.dg/pr102112.f90 : New test.

PR fortran/102190
* gfortran.dg/pr102190.f90 : New test.

PR fortran/102532
* gfortran.dg/pr102532.f90 : New test.

PR fortran/109948
* gfortran.dg/pr109948.f90 : New test.

PR fortran/99326
* gfortran.dg/pr99326.f90 : New test.

RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233

Add _mu C++ overloaded intrinsics for load && viota && vid.

Co-authored-by: KuanLin Chen <best124612@gmail.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto.

RISC-V: Optimize reverse series index vector

This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, ...., 0]

Before this patch:
vid
vmul
vadd

After this patch:
vid
vrsub

This patch is an obvious and simple optimization, ok for trunk?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.

RISC-V: Fix warning in predicated.md

Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))

gcc/ChangeLog:

* config/riscv/predicates.md: Change INTVAL into UINTVAL.

MAINTAINERS: Add myself as MIPS port maintainer

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as MIPS
port maintainer.
(Write After Approval): Remove myself.

RISC-V: Add test for vfloat16*_t (non tuple) types

This patch would like to add some test cases of vfloat16*_t (non tuple),
no 'zvfh' or 'zvfhmin' will meet unknown type.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-16.c: Add test cases.
* gcc.target/riscv/rvv/base/user-7.c: Likewise.

RISC-V: Add __RISCV_ prefix to VXRM and FRM enum

According to doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

Add __RISCV_ prefix to VXRM and FRM enum.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add
__RISCV_ prefix.
(DEF_RVV_FRM_ENUM): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-10.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-11.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-12.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-6.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-7.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-8.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.

RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

1. This patch optimize the codegen of the following auto-vectorization codes:

void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n)
{
    for (int i = 0; i < n; i++)
      c[i] = (int64_t)a[i] + b[i];
}

Combine instruction from:

...
vsext.vf2
vadd.vv
...

into:

...
vwadd.wv
...

Since for PLUS operation, GCC prefer the following RTL operand order when combining:

(plus: (sign_extend:..)
       (reg:)

instead of

(plus: (reg:..)
       (sign_extend:)

which is different from MINUS pattern.

I split patterns of vwadd/vwsub, and add dedicated patterns for them.

2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv
   optimization for complicate PLUS/MINUS codes, consider this following codes:

__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
      int16_t *__restrict dst3, int8_t *__restrict a,
      int8_t *__restrict b, int8_t *__restrict a2,
      int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] + (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
    }
}

Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v  v2,0(a3)
vle8.v  v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2       v3,v2
vsext.vf2       v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v  v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2       v1,v4
vadd.vv v2,v1,v2
...

After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v3,0(a4)
vle8.v v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vv v3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...

The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.

So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv
intrinsic API expander
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
(@pred_single_widen_sub<any_extend:su><mode>): New pattern.
(@pred_single_widen_add<any_extend:su><mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.

RISC-V: Support RVV permutation auto-vectorization

This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.

Fixed following comments from Robin.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_perm<mode>): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.

Daily bump.

Fortran: force error on bad KIND specifier [PR88552]

gcc/fortran/ChangeLog:

PR fortran/88552
* decl.cc (gfc_match_kind_spec): Use error path on missing right
parenthesis.
(gfc_match_decl_type_spec): Use error return when an error occurred
during matching a KIND specifier.

gcc/testsuite/ChangeLog:

PR fortran/88552
* gfortran.dg/pr88552.f90: New test.

testsuite: print any leaking torture options for debugging

This was helpful when debugging the recent multilib testsuite failure.

gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

testsuite: Unbork multilib setups using -march flags (RISC-V)

RISC-V multilib testing is currently busted with follow splat all over:

|    Schedule of variations:
|        riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow
|        riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
|        riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
|        riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
...
...
| ERROR: tcl error code NONE
| ERROR: torture-init: torture_without_loops is not empty as expected

causing insane amount of false failures.

|               ========= Summary of gcc testsuite =========
|                            | # of unexpected case / # of unique unexpected case
|                            |          gcc |          g++ |     gfortran |
| rv64imafdc/  lp64d/ medlow | 5421 /     4 |    1 /     1 |    6 /     1 |
| rv32imafdc/ ilp32d/ medlow | 5422 /     5 |    3 /     2 |    6 /     1 |
|   rv32imac/  ilp32/ medlow |  391 /     5 |    3 /     2 |   43 /     8 |
|   rv64imac/   lp64/ medlow | 5422 /     5 |    1 /     1 |   43 /     8 |

The error splat itself is from recent test harness improvements for stricter
checks for torture-{init,finish} pairing. But the real issue is a latent bug
from 2009: commit 3dd1415dc88, ("i386-prefetch.exp: Skip tests when multilib
flags contain -march") which added an "early exit" condition to i386-prefetch.exp
which could potentially cause an unpaired torture-{init,finish}.

The early exit only happens in a multlib setup using -march in flags
which is what RISC-V happens to use, hence the reason this was only seen
on RISC-V multilib testing.

Moving the early exit outside of torture-{init,finish} bracket
reinstates RISC-V testing.

| rv64imafdc/  lp64d/ medlow |    3 /     2 |    1 /     1 |    6 /     1 |
| rv32imafdc/ ilp32d/ medlow |    4 /     3 |    3 /     2 |    6 /     1 |
|   rv32imac/  ilp32/ medlow |    3 /     2 |    3 /     2 |   43 /     8 |
|   rv64imac/   lp64/ medlow |    5 /     4 |    1 /     1 |   43 /     8 |

gcc/testsuite:
* gcc.misc-tests/i386-prefetch.exp: Move early return outside
the torture-{init,finish}

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

doc: improve docs for -pedantic{,-errors}

Recent discussion of -Wimplicit led me to want to clarify this section of
the documentation, and mark which diagnostics other than -Wpedantic are
affected by -pedantic-errors.

gcc/ChangeLog:

* doc/invoke.texi (-Wpedantic): Improve clarity.

testsuite: Skip powerpc tests on AIX.

AIX does not support -mstrict-align.

pr109566.c had skip directive in wrong order for DejaGNU.

* gcc.target/powerpc/pr100106-sa.c: Skip on AIX.
* gcc.target/powerpc/pr109566.c: Skip on AIX.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

libstdc++: Fix PSTL test that fails in C++20

This test fails in C++20 and later due to a warning:

warning: C++20 says that these are ambiguous, even though the second is reversed:
note: candidate 1: 'bool MyClass::operator==(const MyClass&)'
note: candidate 2: 'bool MyClass::operator==(const MyClass&)' (reversed)
note: try making the operator a 'const' member function
FAIL: 26_numerics/pstl/numeric_ops/transform_reduce.cc (test for excess errors)

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/pstl/numeric_ops/transform_reduce.cc:
Add const to equality operator.

libstdc++: Do not use std::expected::value() in monadic ops (LWG 3938)

The monadic operations in std::expected always check has_value() so we
can avoid the execptional path in value() and the assertions in error()
by accessing _M_val and _M_unex directly. This means that the monadic
operations no longer require _M_unex to be copyable so that it can be
thrown from value(), as modified by LWG 3938.

This also fixes two incorrect uses of std::move in transform(F&&)& and
transform(F&&) const& which I found while making these changes.

Now that move-only error types are supported, it's possible to properly
test the constraints that LWG 3877 added to and_then and transform. The
lwg3877.cc test now does that.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::and_then, expected::or_else)
(expected::transform_error): Use _M_val and _M_unex instead of
calling value() and error(), as per LWG 3938.
(expected::transform): Likewise. Remove incorrect std::move
calls from lvalue overloads.
(expected<void, E>::and_then, expected<void, E>::or_else)
(expected<void, E>::transform): Use _M_unex instead of calling
error().
* testsuite/20_util/expected/lwg3877.cc: Add checks for and_then
and transform, and for std::expected<void, E>.
* testsuite/20_util/expected/lwg3938.cc: New test.

libstdc++: Fix code size regressions in std::vector [PR110060]

My r14-1452-gfb409a15d9babc change to add optimization hints to
std::vector causes regressions because it makes std::vector::size() and
std::vector::capacity() too big to inline. That's the opposite of what
I wanted, so revert the changes to those functions.

To achieve the original aim of optimizing vec.assign(vec.size(), x) we
can add a local optimization hint to _M_fill_assign, so that it doesn't
affect all other uses of size() and capacity().

Additionally, add the same hint to the _M_assign_aux overload for
forward iterators and add that to the testcase.

It would be nice to similarly optimize:
if (vec1.size() == vec2.size()) vec1 = vec2;
but adding hints to operator=(const vector&) doesn't help. Presumably
the relationships between the two sizes and two capacities are too
complex to track effectively.

libstdc++-v3/ChangeLog:

PR libstdc++/110060
* include/bits/stl_vector.h (_Vector_base::_M_invariant):
Remove.
(vector::size, vector::capacity): Remove calls to _M_invariant.
* include/bits/vector.tcc (vector::_M_fill_assign): Add
optimization hint to reallocating path.
(vector::_M_assign_aux(FwdIter, FwdIter, forward_iterator_tag)):
Likewise.
* testsuite/23_containers/vector/capacity/invariant.cc: Moved
to...
* testsuite/23_containers/vector/modifiers/assign/no_realloc.cc:
...here. Check assign(FwdIter, FwdIter) too.
* testsuite/23_containers/vector/types/1.cc: Revert addition
of -Wno-stringop-overread option.

libstdc++: Document removal of implicit allocator rebinding extensions

Traditionally libstdc++ allowed containers and strings to be
instantiated with allocator's that have the wrong value type, implicitly
rebinding the allocator to the container's value type. Since C++20 that
has been explicitly ill-formed, so the extension is no longer supported
in strict modes (e.g. -std=c++17) and in C++20 and later.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document removal of implicit
allocator rebinding extensions in strict mode and for C++20.
* doc/html/*: Regenerate.

cse: Change return type of predicate functions from int to bool

Also change some function arguments to bool and remove one instance
of always zero function argument.

gcc/ChangeLog:

* rtl.h (exp_equiv_p): Change return type from int to bool.
* cse.cc (mention_regs): Change return type from int to bool
and adjust function body accordingly.
(exp_equiv_p): Ditto.
(insert_regs): Ditto. Change "modified" function argument to bool
and update usage accordingly.
(record_jump_cond): Remove always zero "reversed_nonequality"
function argument and update usage accordingly.
(fold_rtx): Change "changed" variable to bool.
(record_jump_equiv): Remove unneeded "reversed_nonequality" variable.
(is_dead_reg): Change return type from int to bool.

xtensa: Add 'adddi3' and 'subdi3' insn patterns

More optimized than the default RTL generation.

gcc/ChangeLog:

* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).

PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86.

This is my proposed minimal fix for PR target/109973 (hopefully suitable
for backporting) that follows Jakub Jelinek's suggestion that we introduce
CCZmode and CCCmode variants of ptest and vptest, so that the i386
backend treats [v]ptest instructions similarly to testl instructions;
using different CCmodes to indicate which condition flags are desired,
and then relying on the RTL cmpelim pass to eliminate redundant tests.

This conveniently matches Intel's intrinsics, that provide different
functions for retrieving different flags, _mm_testz_si128 tests the
Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
optimization is only valid when the ptest/vptest instruction is used to
set/test the Z flag.

The downside, as predicted by Jakub, is that GCC's cmpelim pass is
currently COMPARE-centric and not able to merge the ptests from expressions
such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
known issue, PR target/80040.

2023-06-01  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
PR target/109973
* config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
CODE_for_sse4_1_ptestzv2di.
(__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
(__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
(__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
* config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
when expanding UNSPEC_PTEST to compare against zero.
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
(general_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
(timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
* config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
* config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
check for suitable matching modes for the UNSPEC_PTEST pattern.
* config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
(*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.  Remove
":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
(<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
(<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
(<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
current behavior.
(*ptest<mode>_and): Specify CCZ to only perform this optimization
when only the Z flag is required.

gcc/testsuite/ChangeLog
PR target/109973
* gcc.target/i386/pr109973-1.c: New test case.
* gcc.target/i386/pr109973-2.c: Likewise.

libstdc++: optimize EH phase 2

In the ABI's two-phase EH model, first we walk the stack looking for a
handler, then we walk the stack running cleanups until we reach that
handler.  In the cleanup phase, we shouldn't redundantly check the handlers
along the way, e.g. when walking through g():

  void f() { throw 42; }
  void g() { try { f(); } catch (void *) { } }
  int main() { try { g(); } catch (int) { } }

libstdc++-v3/ChangeLog:

* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Don't check
handlers in the cleanup phase.

doc: Fix description of x86 -m32 option [PR109954]

This option does not imply -march=i386 so it's incorrect to say it
generates code that will run on "any i386 system".

gcc/ChangeLog:

PR target/109954
* doc/invoke.texi (x86 Options): Fix description of -m32 option.

libstdc++: Fix condition for supported SIMD types on ARMv8

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/110050
* include/experimental/bits/simd.h (__vectorized_sizeof): With
__have_neon_a32 only single-precision float works (in addition
to integers).

aarch64: Add =r,m and =m,r alternatives to 64-bit vector move patterns

We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives
to the mov patterns. This straightforward patch does that and for the pair variants too.
For the testcase in the code we now generate the optimal assembly without any superfluous
GP<->SIMD moves.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Add =r,m and =r,m alternatives.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/xreg-vec-modes_1.c: New test.

OpenMP/Fortran: Permit pure directives inside PURE

Update permitted directives for directives marked in OpenMP's 5.2 as pure.
To ensure that list is updated, unimplemented directives are placed into
pure-2.f90 such the test FAILs once a known to be pure directive is
implemented without handling its pureness.

gcc/fortran/ChangeLog:

* parse.cc (decode_omp_directive): Accept all pure directives
inside a PURE procedures; handle 'error at(execution).

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.2): Mark pure-directive handling as 'Y'.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/nothing-2.f90: Remove one dg-error.
* gfortran.dg/gomp/pr79154-2.f90: Update expected dg-error wording.
* gfortran.dg/gomp/pr79154-simd.f90: Likewise.
* gfortran.dg/gomp/pure-1.f90: New test.
* gfortran.dg/gomp/pure-2.f90: New test.
* gfortran.dg/gomp/pure-3.f90: New test.
* gfortran.dg/gomp/pure-4.f90: New test.