git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Fix condition for supported SIMD types on ARMv8

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/110050
* include/experimental/bits/simd.h (__vectorized_sizeof): With
__have_neon_a32 only single-precision float works (in addition
to integers).

aarch64: Add =r,m and =m,r alternatives to 64-bit vector move patterns

We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives
to the mov patterns. This straightforward patch does that and for the pair variants too.
For the testcase in the code we now generate the optimal assembly without any superfluous
GP<->SIMD moves.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Add =r,m and =r,m alternatives.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/xreg-vec-modes_1.c: New test.

OpenMP/Fortran: Permit pure directives inside PURE

Update permitted directives for directives marked in OpenMP's 5.2 as pure.
To ensure that list is updated, unimplemented directives are placed into
pure-2.f90 such the test FAILs once a known to be pure directive is
implemented without handling its pureness.

gcc/fortran/ChangeLog:

* parse.cc (decode_omp_directive): Accept all pure directives
inside a PURE procedures; handle 'error at(execution).

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.2): Mark pure-directive handling as 'Y'.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/nothing-2.f90: Remove one dg-error.
* gfortran.dg/gomp/pr79154-2.f90: Update expected dg-error wording.
* gfortran.dg/gomp/pr79154-simd.f90: Likewise.
* gfortran.dg/gomp/pure-1.f90: New test.
* gfortran.dg/gomp/pure-2.f90: New test.
* gfortran.dg/gomp/pure-3.f90: New test.
* gfortran.dg/gomp/pure-4.f90: New test.

RISC-V: Introduce vfloat16m{f}*_t and their machine mode.

This patch would like to introduce the built-in type vfloat16m{f}*_t, as
well as their machine mode VNx*HF. They depend on architecture zvfhmin
or zvfh.

When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will
be true.

The underlying PATCH will implement the zvfhmin extension based on this.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin
and zvfh.
* config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16.
(main): Disable FP16 tuple.
* config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro.
(TARGET_VECTOR_ELEN_FP_16): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add FP16.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16):
New macro.
* config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16
machine mode based on TARGET_VECTOR_ELEN_FP_16.

libstdc++: Reduce <functional> inclusion to <stl_algobase.h>

Move the std::search definition from stl_algo.h to stl_algobase.h and use
the later in <functional>.

For consistency also move std::__parallel::search and associated helpers from
<parallel/stl_algo.h> to <parallel/stl_algobase.h> so that std::__parallel::search
is accessible along with std::search.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h
(std::__search, std::search(_FwdIt1, _FwdIt1, _FwdIt2, _FwdIt2, _BinPred)): Move...
* include/bits/stl_algobase.h: ...here.
* include/std/functional: Replace <stl_algo.h> include by <stl_algobase.h>.
* include/parallel/algo.h (std::__parallel::search<_FIt1, _FIt2, _BinaryPred>)
(std::__parallel::__search_switch<_FIt1, _FIt2, _BinaryPred, _ItTag1, _ItTag2>):
Move...
* include/parallel/algobase.h: ...here.
* include/experimental/functional: Remove <bits/stl_algo.h> and <parallel/algorithm>
includes. Include <bits/stl_algobase.h>.

c++: make -fpermissive avoid -Werror=narrowing

Currently we make -Wnarrowing an error by default by forcing pedantic_errors
on, but for consistency -fpermissive should prevent that.

In general I'm inclined to move away from using permerror in favor of this
kind of model, with specific flags for each diagnostic.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Check flag_permissive.

Daily bump.

RISC-V: Add RVV FRM enum for floating-point rounding mode intriniscs

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (register_frm): New function.
(DEF_RVV_FRM_ENUM): New macro.
(handle_pragma_vector): Add FRM enum
* config/riscv/riscv-vector-builtins.def (DEF_RVV_FRM_ENUM): New macro.
(RNE): Ditto.
(RTZ): Ditto.
(RDN): Ditto.
(RUP): Ditto.
(RMM): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/frm-1.c: New test.

Refactor wi::bswap as a function (instead of a method).

This patch implements Richard Sandiford's suggestion from
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html
that wi::bswap (and a new wi::bitreverse) should be functions,
and ideally only accessors are member functions. This patch
implements the first step, moving/refactoring wi::bswap.

2023-05-31 Roger Sayle <roger@nextmovesoftware.com>
 Richard Sandiford <richard.sandiford@arm.com>

gcc/ChangeLog
* fold-const-call.cc (fold_const_call_ss) <CFN_BUILT_IN_BSWAP*>:
Update call to wi::bswap.
* simplify-rtx.cc (simplify_const_unary_operation) <case BSWAP>:
Update call to wi::bswap.
* tree-ssa-ccp.cc (evaluate_stmt) <case BUILT_IN_BSWAP*>:
Update calls to wi::bswap.

* wide-int.cc (wide_int_storage::bswap): Remove/rename to...
(wi::bswap_large): New function, with revised API.
* wide-int.h (wi::bswap): New (template) function prototype.
(wide_int_storage::bswap): Remove method.
(sext_large, zext_large): Consistent indentation/line wrapping.
(bswap_large): Prototype helper function containing implementation.
(wi::bswap): New template wrapper around bswap_large.

libstdc++: Add separate autoconf macro for std::float_t and std::double_t [PR109818]

This should make it possible to use openlibm with djgpp (and other
targets with missing C99 <math.h> functions). The <math.h> from openlibm
provides all the functions, but not the float_t and double_t typedefs.
By separating the autoconf checks for the functionsand the typedefs, we
don't disable support for all the functions just because those typedefs
are not present.

libstdc++-v3/ChangeLog:

PR libstdc++/109818
* acinclude.m4 (GLIBCXX_ENABLE_C99): Add separate check for
float_t and double_t and define HAVE_C99_FLT_EVAL_TYPES.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/cmath (float_t, double_t): Guard using new
_GLIBCXX_HAVE_C99_FLT_EVAL_TYPES macro.

libstdc++: Stop using _GLIBCXX_USE_C99_MATH_TR1 in <cmath>

Similar to the three commits r14-908, r14-909 and r14-910, the
_GLIBCXX_USE_C99_MATH_TR1 macro is misleading when it is also used for
<cmath>, not only for <tr1/cmath> headers. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define the
C99 features for C++11 but not for C++98.

Add separate configure checks for the <math.h> functions using
-std=c++11 for the checks. Use the new macro defined by those checks in
the C++11-specific parts of <cmath>, and in <complex>, <random> etc.

The check that defines _GLIBCXX_NO_C99_ROUNDING_FUNCS is only needed for
the C++11 <cmath> checks, so remove that from GLIBCXX_CHECK_C99_TR1 and
only do it for GLIBCXX_ENABLE_C99.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_C99): Add checks for C99 math
functions and define _GLIBCXX_USE_C99_MATH_FUNCS. Move checks
for C99 rounding functions to here.
(GLIBCXX_CHECK_C99_TR1): Remove checks for C99 rounding
functions from here.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/bits/random.h: Use _GLIBCXX_USE_C99_MATH_FUNCS instead
of _GLIBCXX_USE_C99_MATH_TR1.
* include/bits/random.tcc: Likewise.
* include/c_compatibility/math.h: Likewise.
* include/c_global/cmath: Likewise.
* include/ext/random: Likewise.
* include/ext/random.tcc: Likewise.
* include/std/complex: Likewise.
* testsuite/20_util/from_chars/4.cc: Likewise.
* testsuite/20_util/from_chars/8.cc: Likewise.
* testsuite/26_numerics/complex/proj.cc: Likewise.
* testsuite/26_numerics/headers/cmath/60401.cc: Likewise.
* testsuite/26_numerics/headers/cmath/types_std_c++0x.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_v3_target_cstdint):
Likewise.
* testsuite/util/testsuite_random.h: Likewise.

libstdc++: Express std::vector's size() <= capacity() invariant in code

This adds optimizer hints so that GCC knows that size() <= capacity() is
always true. This allows the compiler to optimize away re-allocating
paths when assigning new values to the vector without resizing it, e.g.,
vec.assign(vec.size(), new_val).

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (_Vector_base::_M_invariant()): New
function.
(vector::size(), vector::capacity()): Call _M_invariant().
* testsuite/23_containers/vector/capacity/invariant.cc: New test.
* testsuite/23_containers/vector/types/1.cc: Add suppression for
false positive warning (PR110060).

libstdc++: Fix build for targets without _Float128 [PR109921]

My r14-1431-g7037e7b6e4ac41 change caused the _Float128 overload to be
compiled unconditionally, by moving the USE_STRTOF128_FOR_FROM_CHARS
check into the function body. That function should still only be
compiled if the target actually supports _Float128.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc: Check __FLT128_MANT_DIG__ is
defined before trying to use _Float128.

libstdc++: Fix configure test for 32-bit targets

The -mlarge model for msp430-elf uses 20-bit pointers, which means that
sizeof(void*) == 4 and so the r14-1432-g51cf0b3949b88b change gives the
wrong answer. Check __INTPTR_WIDTH__ >= 32 instead.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Fix for 32-bit pointers
to check __INT_PTR_WIDTH__ instead of sizeof(void*).
* configure: Regenerate.

testsuite: rename force_conventional_output

The procedure force_conventional_output_for is a bit misnomed, what it
primarily does is to set the required options for the corresponding
test. So rename the proc to set_required_options_for and also rename the
participating variable accordingly.

gcc/testsuite/ChangeLog:

* lib/gcc-dg.exp: Rename gcc_force_conventional_output to
gcc_set_required_options.
* lib/target-supports.exp: Rename force_conventional_output_for
to set_required_options_for.
* lib/scanasm.exp: Adjust callers.
* lib/scanrtl.exp: Same.

aarch64: PR target/99195 Annotate dot-product patterns for vec-concat-zero

This straightforward patch annotates the dotproduct instructions, including the i8mm ones.
Tests included.
Nothing unexpected here.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Rename to...
(<sur>dot_prod<vsi2qi><vczle><vczbe>): ... This.
(usdot_prod<vsi2qi>): Rename to...
(usdot_prod<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_lane<vsi2qi>): Rename to...
(aarch64_<sur>dot_lane<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_laneq<vsi2qi>): Rename to...
(aarch64_<sur>dot_laneq<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi>): Rename to...
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi><vczle><vczbe>):
... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_11.c: New test.

aarch64: PR target/99195 Annotate saturating mult patterns for vec-concat-zero

This patch goes through the various alphabet soup saturating multiplication patterns, including those in TARGET_RDMA
and annotates them with <vczle><vczbe>. Many other patterns are widening and always write the full 128-bit vectors
so this annotation doesn't apply to them. Nothing out of the ordinary in this patch.

Bootstrapped and tested on aarch64-none-linux and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_sq<r>dmulh<mode>): Rename to...
(aarch64_sq<r>dmulh<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_n<mode>): Rename to...
(aarch64_sq<r>dmulh_n<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_lane<mode>): Rename to...
(aarch64_sq<r>dmulh_lane<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_laneq<mode>): Rename to...
(aarch64_sq<r>dmulh_laneq<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for qdmulh, qrdmulh.
* gcc.target/aarch64/simd/pr99195_10.c: New test.

btf: improve -dA comments for testsuite

Many BTF type kinds refer to other types via index to the final types
list. However, the order of the final types list is not guaranteed to
remain the same for the same source program between different runs of
the compiler, making it difficult to test inter-type references.

This patch updates the assembler comments output when writing a
given BTF record to include minimal information about the referenced
type, if any. This allows for the regular expressions used in the gcc
testsuite to do some basic integrity checks on inter-type references.

For example, for the type

unsigned int *

Assembly comments like the following are written with -dA:

.4byte 0 ; TYPE 2 BTF_KIND_PTR ''
.4byte 0x2000000 ; btt_info: kind=2, kflag=0, vlen=0
.4byte 0x1 ; btt_type: (BTF_KIND_INT 'unsigned int')

Several BTF tests which can immediately be made more robust with this
change are updated. It will also be useful in new tests for the upcoming
btf_type_tag support.

gcc/

* btfout.cc (btf_kind_names): New.
(btf_kind_name): New.
(btf_absolute_var_id): New utility function.
(btf_relative_var_id): Likewise.
(btf_relative_func_id): Likewise.
(btf_absolute_datasec_id): Likewise.
(btf_asm_type_ref): New.
(btf_asm_type): Update asm comments and use btf_asm_type_ref ().
(btf_asm_array): Likewise. Accept ctf_container_ref parameter.
(btf_asm_varent): Likewise.
(btf_asm_func_arg): Likewise.
(btf_asm_datasec_entry): Likewise.
(btf_asm_datasec_type): Likewise.
(btf_asm_func_type): Likewise. Add index parameter.
(btf_asm_enum_const): Likewise.
(btf_asm_sou_member): Likewise.
(output_btf_vars): Update btf_asm_* call accordingly.
(output_asm_btf_sou_fields): Likewise.
(output_asm_btf_enum_list): Likewise.
(output_asm_btf_func_args_list): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_func_types): Add ctf_container_ref parameter.
Pass it to btf_asm_func_type.
(output_btf_datasec_types): Update btf_asm_datsec_type call similarly.
(btf_output): Update output_btf_func_types call similarly.

gcc/testsuite/

* gcc.dg/debug/btf/btf-array-1.c: Use new BTF asm comments
in scan-assembler expressions where useful.
* gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-2.c: Likewise.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-function-6.c: Likewise.
* gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-2.c: Likewise.
* gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
* gcc.dg/debug/btf/btf-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-2.c: Likewise. Update outdated comment.
* gcc.dg/debug/btf/btf-function-3.c: Update outdated comment.

btf: be clear when record size/type is not used

All BTF type records have a 4-byte field used to encode a size or link
to another type, depending on the type kind. But BTF_KIND_ARRAY and
BTF_KIND_FWD do not use this field at all, and should write zero.

GCC already correctly writes zero in this field for these type kinds,
but the process is not straightforward and results in the -dA comment
claiming the field is a reference to another type. This patch makes
the behavior explicit and updates the assembler comment to state
clearly that the field is unused.

gcc/

* btfout.cc (btf_asm_type): Add dedicated cases for BTF_KIND_ARRAY
and BTF_KIND_FWD which do not use the size/type field at all.

emit-rtl: Change return type of predicate functions from int to bool

Also fix some stalled comments.

gcc/ChangeLog:

* rtl.h (subreg_lowpart_p): Change return type from int to bool.
(active_insn_p): Ditto.
(in_sequence_p): Ditto.
(unshare_all_rtl): Change return type from int to void.
* emit-rtl.h (mem_expr_equal_p): Change return type from int to bool.
* emit-rtl.cc (subreg_lowpart_p): Change return type from int to bool
and adjust function body accordingly.
(mem_expr_equal_p): Ditto.
(unshare_all_rtl): Change return type from int to void
and adjust function body accordingly.
(verify_rtx_sharing): Remove unneeded return.
(active_insn_p): Change return type from int to bool
and adjust function body accordingly.
(in_sequence_p): Ditto.

alias: Change return type of predicate functions from int to bool

Also remove a bunch of unneeded forward declarations.

gcc/ChangeLog:

* rtl.h (true_dependence): Change return type from int to bool.
(canon_true_dependence): Ditto.
(read_dependence): Ditto.
(anti_dependence): Ditto.
(canon_anti_dependence): Ditto.
(output_dependence): Ditto.
(canon_output_dependence): Ditto.
(may_alias_p): Ditto.
* alias.h (alias_sets_conflict_p): Ditto.
(alias_sets_must_conflict_p): Ditto.
(objects_must_conflict_p): Ditto.
(nonoverlapping_memrefs_p): Ditto.
* alias.cc (rtx_equal_for_memref_p): Remove forward declaration.
(record_set): Ditto.
(base_alias_check): Ditto.
(find_base_value): Ditto.
(mems_in_disjoint_alias_sets_p): Ditto.
(get_alias_set_entry): Ditto.
(decl_for_component_ref): Ditto.
(write_dependence_p): Ditto.
(memory_modified_1): Ditto.
(mems_in_disjoint_alias_set_p): Change return type from int to bool
and adjust function body accordingly.
(alias_sets_conflict_p): Ditto.
(alias_sets_must_conflict_p): Ditto.
(objects_must_conflict_p): Ditto.
(rtx_equal_for_memref_p): Ditto.
(base_alias_check): Ditto.
(read_dependence): Ditto.
(nonoverlapping_memrefs_p): Ditto.
(true_dependence_1): Ditto.
(true_dependence): Ditto.
(canon_true_dependence): Ditto.
(write_dependence_p): Ditto.
(anti_dependence): Ditto.
(canon_anti_dependence): Ditto.
(output_dependence): Ditto.
(canon_output_dependence): Ditto.
(may_alias_p): Ditto.
(init_alias_analysis): Change "changed" variable to bool.

RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS
;; to combine instructions as below:
;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv

gcc/ChangeLog:

* config/riscv/autovec.md (<optab><v_double_trunc><mode>2): Change
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

RISC-V: Add testcase for vrsub.vi auto-vectorization

Apparently, we are missing vrsub.vi tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove FRM for vfncvt.rod instruction

Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Add pattern for bswap + rotate [PR 110039]

After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
pattern to match the new GIMPLE form.

With this patch, gcc.target/aarch64/rev16_2.c passes again.

2023-05-31 Christophe Lyon <christophe.lyon@linaro.org>

PR target/110039
gcc/
* config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New
pattern.

libstdc++: Do not include <exception> in <mutex>

We previously needed <exception> in <mutex> for the std::lock_error
exception class, but that was moved out of <mutex> in 2009 when it was
removed from the C++0x draft. We can stop including <exception> now.

Move the include for <bits/error_constants.h> to <bits/unique_lock.h>
where it's actually used, and only include <errno.h> in <mutex> (for
EAGAIN and EDEADLK).

Also add some headers to <mutex> that are needed but are not included
directly: <bits/functexcept.h>, <bits/invoke.h> and <bits/move.h>.

libstdc++-v3/ChangeLog:

* include/bits/unique_lock.h: Include <bits/error_constants.h>
here for std::errc constants.
* include/std/mutex: Do not include <bits/error_constants.h> and
<exception> here.

libstdc++: Replace obsolete shell syntax in configure.ac

The current POSIX standard says that the -a and -o operators to the
'test' utility are obsolete, and the shell operators && and || should be
used instead.

libstdc++-v3/ChangeLog:

* configure.ac: Replace use of -o operator for test.
* configure: Regenerate.

libstdc++: Add missing noexcept to std::scoped_allocator_adaptor

The standard requires these constructors and accessors to be noexcept.

libstdc++-v3/ChangeLog:

* include/std/scoped_allocator (scoped_allocator_adaptor): Add
noexcept to all constructors except the default constructor.
(scoped_allocator_adaptor::inner_allocator): Add noexcept.
(scoped_allocator_adaptor::outer_allocator): Likewise.
* testsuite/20_util/scoped_allocator/noexcept.cc: New test.

libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

As suggested by Jakub in the PR, this just hardcodes the constants with
a Q suffix, since the properties of __float128 are not going to change.

We can only define it for non-strict modes because the suffix gives an
error otherwise, even in system headers:

limits:2085: error: unable to find numeric literal operator 'operator""Q'

libstdc++-v3/ChangeLog:

PR libstdc++/104772
* include/std/limits (numeric_limits<__float128>): Define.
* testsuite/18_support/numeric_limits/128bit.cc: New test.

libstdc++: Disable embedded tzdata for all 16-bit targets

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Extend logic for avr and
msp430 to all 16-bit targets.
* configure: Regenerate.

libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

We use the from_chars_strtod function with __strtof128 to read a
_Float128 value, but from_chars_strtod is not defined unless uselocale
is available. This can lead to compilation failures for some targets,
because we try to define the _Flaot128 overload in terms of a
non-existing from_chars_strtod function.

Only try to use __strtof128 if uselocale is available, otherwise
fallback to the long double overload of std::from_chars (which might
fallback to the double overload, which should use fast_float).

This ensures we always define the full set of overloads, even if they
are not always accurate for all values of the wider types.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc (USE_STRTOF128_FOR_FROM_CHARS):
Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
(USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double is
binary64.
(from_chars(const char*, const char*, double&, chars_format)):
Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
(from_chars(const char*, const char*, _Float128&, chars_format))
Only use from_chars_strtod when USE_STRTOD_FOR_FROM_CHARS is
defined, otherwise parse a long double and convert to _Float128.

libstdc++: Deprecate std::setfill for std::basic_istream [PR109922]

Prior to N0966 (July 1996) the std::setfill manipulator was specified to
work with both input and output streams. In the final C++98 standard it
is only specified to work with output streams.

We have always supported it for input streams, despite that never being
in the standard, and having no meaning for any input streams defined by
the standard. This commit adds a deprecated attribute to the overload
for input streams, so that we can stop supporting this some day.

libstdc++-v3/ChangeLog:

PR libstdc++/109922
* include/std/iomanip (operator>>(basic_istream&, _Setfill)):
Add deprecated attribute to non-standard overload.
* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/*: Regenerate.
* testsuite/27_io/manipulators/standard/char/1.cc: Add
dg-warning for expected deprecated warning.
* testsuite/27_io/manipulators/standard/char/2.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/1.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/2.cc: Likewise.

ipa/109983 - (IPA) PTA speedup

This improves the edge avoidance heuristic by re-ordering the
topological sort of the graph to make sure the component with
the ESCAPED node is processed first. This improves the number
of created edges which directly correlates with the number
of bitmap_ior_into calls from 141447426 to 239596 and the
compile-time from 1083s to 3s. It also improves the compile-time
for the related PR109143 from 81s to 27s.

I've modernized the topological sorting API on the way as well.

PR ipa/109983
PR tree-optimization/109143
* tree-ssa-structalias.cc (struct topo_info): Remove.
(init_topo_info): Likewise.
(free_topo_info): Likewise.
(compute_topo_order): Simplify API, put the component
with ESCAPED last so it's processed first.
(topo_visit): Adjust.
(solve_graph): Likewise.

IPA PTA stats enhancement and non-details dump slimming

The following keeps track of the number of edges we avoid to create
because they redundandly feed ESCAPED. It also avoids printing
a header for -details when not using -details.

* tree-ssa-structalias.cc (constraint_stats::num_avoided_edges):
New.
(add_graph_edge): Count redundant edges we avoid to create.
(dump_sa_stats): Dump them.
(ipa_pta_execute): Do not dump generating constraints when
we are not dumping them.

aarch64: Simplify output template emission code for a few patterns

If the output code for a define_insn just does a switch (which_alternative) with no other computation we can almost always
replace it with more compact MD syntax for each alternative in a mult-alternative '@' block.
This patch cleans up some such patterns in the aarch64 backend, making them shorter and more concise.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>): Rewrite
output template to avoid explicit switch on which_alternative.
(*aarch64_simd_mov<VQMOV:mode>): Likewise.
(and<mode>3): Likewise.
(ior<mode>3): Likewise.
* config/aarch64/aarch64.md (*mov<mode>_aarch64): Likewise.

xtensa: Improve "*shlrd_reg" insn pattern and its variant

The insn "*shlrd_reg" shifts two registers with a funnel shifter by the
third register to get a single word result:

 reg0 = (reg1 SHIFT_OP0 reg3) BIT_JOIN_OP (reg2 SHIFT_OP1 (32 - reg3))

where the funnel left shift is SHIFT_OP0 := ASHIFT, SHIFT_OP1 := LSHIFTRT
and its right shift is SHIFT_OP0 := LSHIFTRT, SHIFT_OP1 := ASHIFT,
respectively. And also, BIT_JOIN_OP can be either PLUS or IOR in either
shift direction.

 [(set (match_operand:SI 0 "register_operand" "=a")
(match_operator:SI 6 "xtensa_bit_join_operator"
[(match_operator:SI 4 "logical_shift_operator"
[(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 3 "register_operand" "r")])
(match_operator:SI 5 "logical_shift_operator"
[(match_operand:SI 2 "register_operand" "r")
(neg:SI (match_dup 3))])]))]

Although the RTL matching template can express it as above, there is no
way of direcing that the operator (operands[6]) that combines the two
individual shifts is commutative.
Thus, if multiple insn sequences matching the above pattern appear
adjacently, the combiner may accidentally mix them up and get partial
results.

This patch adds a new insn-and-split pattern with the two sides swapped
representation of the bit-combining operation that was lacking and
described above.

And also changes the other "*shlrd" variants from previously describing
the arbitraryness of bit-combining operations with code iterators to a
combination of the match_operator and the predicate above.

gcc/ChangeLog:

* config/xtensa/predicates.md (xtensa_bit_join_operator):
New predicate.
* config/xtensa/xtensa.md (ior_op): Remove.
(*shlrd_reg): Rename from "*shlrd_reg_<code>", and add the
insn_and_split pattern of the same name to express and capture
the bit-combining operation with both sides swapped.
In addition, replace use of code iterator with new operator
predicate.
(*shlrd_const, *shlrd_per_byte):
Likewise regarding the code iterator.

Fix ICE in rewrite_expr_tree_parallel

1. Limit the value of tree-reassoc-width to IntegerRange(0, 256).
2. Add width limit in rewrite_expr_tree_parallel.

gcc/ChangeLog:

PR tree-optimization/110038
* params.opt: Add a limit on tree-reassoc-width.
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Add width limit.

gcc/testsuite/ChangeLog:

PR tree-optimization/110038
* gcc.dg/pr110038.c: New test.

RISC-V: Add ZVFH extension to the -march= option

This patch would like to add new sub extension (aka ZVFH) to the -march= option.
To make it simple, only the sub extension itself is involved in this patch, and
the underlying FP16 related RVV intrinsic API depends on the TARGET_ZVFH.

The Zvfh extension depends on the Zve32f and Zfhmin extensions. You can locate
more information about ZVFH from below spec doc.

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfh item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZVFH): New macro.
(TARGET_ZVFH): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-21.c: New test.
* gcc.target/riscv/predef-27.c: New test.

RISC-V: Fix unreachable test code for init repeat sequence.

This patch fix one unreachable test code, which is for debugging purpose
without cleanup before commit.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c:
Remove debug code.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.

.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector float if availble.

gcc/ChangeLog:

PR tree-optimization/108804
* tree-vect-patterns.cc (vect_get_range_info): Remove static.
* tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
Add new parameter narrow_src_p.
(vectorizable_conversion): Enhance NARROW FLOAT_EXPR
vectorization by truncating to lower precision.
* tree-vectorizer.h (vect_get_range_info): New declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108804.c: New test.

testsuite: add verify-sarif-file to some testcases that were missing it

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/malloc-sarif-1.c: Add missing verify-sarif-file
directive.
* gcc.dg/analyzer/sarif-pr107366.c: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[libstdc++] [testsuite] xfail double-prec from_chars for x86_64 ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on x86_64-*-vxworks*.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on x86_64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
x86_64-vxworks.

testsuite/52641: Fix more of implicit int=32 assumption fallout.

gcc/testsuite/
PR testsuite/52641
* gcc.dg/torture/pr107451.c: Require int32plus.
* gcc.dg/torture/pr108574-3.c: Use __INT32_TYPE__ instead of int.
* gcc.dg/torture/pr109940.c: Use __INTPTR_TYPE__ instead of long.
* gcc.dg/torture/pr95248.c: Require size24plus.
* gcc.dg/torture/pr95295-3.c: Use var_* with at least 32 bits int.
* gcc.dg/torture/pr98640.c: Cast to __INT32_TYPE__ instead of int.
* gcc.dg/tree-ssa/pr103771.c: Use int with at least 32 bits.

LRA: Update insn sp offset if its input reload changes SP

The patch fixes a bug when there is input reload changing SP.  The bug was
triggered by switching H8300 target to LRA.  The insn in question is

(insn 21 20 22 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
        (reg/f:SI 31)) "j.c":10:3 19 {*movsi}
     (expr_list:REG_DEAD (reg/f:SI 31)
        (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
            (nil))))

The memory address is reloaded but the SP offset for the original insn was not updated.

gcc/ChangeLog:

* lra-int.h (lra_update_sp_offset): Add the prototype.
* lra.cc (setup_sp_offset): Change the return type.  Use
lra_update_sp_offset.
* lra-eliminations.cc (lra_update_sp_offset): New function.
(lra_process_new_insns): Push the current insn to reprocess if the
input reload changes sp offset.

i386: Fix misleading identation in i386-expand.cc [PR110041]

gcc/ChangeLog:

PR target/110041
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Fix misleading identation.

jump: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* rtl.h (comparison_dominates_p): Change return type from int to bool.
(condjump_p): Ditto.
(any_condjump_p): Ditto.
(any_uncondjump_p): Ditto.
(simplejump_p): Ditto.
(returnjump_p): Ditto.
(eh_returnjump_p): Ditto.
(onlyjump_p): Ditto.
(invert_jump_1): Ditto.
(invert_jump): Ditto.
(rtx_renumbered_equal_p): Ditto.
(redirect_jump_1): Ditto.
(redirect_jump): Ditto.
(condjump_in_parallel_p): Ditto.
* jump.cc (invert_exp_1): Adjust forward declaration.
(comparison_dominates_p): Change return type from int to bool
and adjust function body accordingly.
(simplejump_p): Ditto.
(condjump_p): Ditto.
(condjump_in_parallel_p): Ditto.
(any_uncondjump_p): Ditto.
(any_condjump_p): Ditto.
(returnjump_p): Ditto.
(eh_returnjump_p): Ditto.
(onlyjump_p): Ditto.
(redirect_jump_1): Ditto.
(redirect_jump): Ditto.
(invert_exp_1): Ditto.
(invert_jump_1): Ditto.
(invert_jump): Ditto.
(rtx_renumbered_equal_p): Ditto.

MAINTAINERS: Add myself to write after approval

2023-05-30 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

ChangeLog:
* MAINTAINERS (Write After Approval): Add myself.

testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the
'wrong' version:
 invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
 invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.

Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparison

This patch adds the support for match that was implemented for PR 87913 in phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.

minmax-20.c is the new testcase which tests the ranges part.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-22.c: New test.

MATCH: Move `a <= CST1 ? MAX<a, CST2> : a` optimization to match

This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.

There is one more change needed before being able to remove
minmax_replacement from phiopt.

A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.

Fix ACLE data-intrinsics testcases

data-intrinsics-assembly.c forces -march=armv6 using dg-add-options
arm_arch_v6, which implicitly adds -mfloat-abi=softfp.

However, for a toolchain configured for arm-linux-gnueabihf and
--with-arch=armv7-a, the testcase will fail when including arm_acle.h
(which includes stdint.h, which will fail to include the non-existing
gnu/stubs-soft.h).

Other effective-targets related to arm_acle.h would also pass because
they first try without -mfloat-abi=softfp, so it seems the
simplest/safest is to add { dg-require-effective-target arm_softfp_ok }
to make sure arm_arch_v6_ok's assumption is valid.

The patch also fixes what seems to be an oversight in
data-intrinsics-armv6.c: it requires arm_arch_v6_ok, but uses
arm_arch_v6t2: the patch makes it require arm_arch_v6t2_ok.

2023-05-30 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/acle/data-intrinsics-armv6.c: Fix typo.
* gcc.target/arm/acle/data-intrinsics-assembly.c: Require
arm_softfp_ok.

libstdc++: Correct NTTP and simd_mask ctor call

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109822
* include/experimental/bits/simd.h (to_native): Use int NTTP
as specified in PTS2.
(to_compatible): Likewise. Add missing tag to call mask
generator ctor.
* testsuite/experimental/simd/pr109822_cast_functions.cc: New
test.

libstdc++: Simplify calculation of expected value in simd test

This avoids a failure on PR109964.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/integer_operators.cc:
Compute expected value differently to avoid getting turned into
a vector shift.

libstdc++: Fix test assumptions on long and long double

Expect that long might not fit into the long double mantissa bits.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/operator_cvt.cc: Make long
double <-> (u)long conversion tests conditional on sizeof(long
double) and sizeof(long).

Replace a HWI_COMPUTABLE_MODE_P with wide-int in simplify-rtx.cc.

This patch enhances one of the optimizations in simplify_binary_operation_1
to allow it to simplify RTL expressions in modes wider than HOST_WIDE_INT
by replacing a use of HWI_COMPUTABLE_MODE_P and UINTVAL with wide_int.

The motivating example is a pending x86_64 backend patch that produces
the following RTL in combine:

(and:TI (zero_extend:TI (reg:DI 89))
 (const_wide_int 0x0ffffffffffffffff))

where the AND is redundant, as the mask, ~0LL, is DImode's MODE_MASK.
There's already an optimization that catches this for narrower modes,
transforming (and:HI (zero_extend:HI (reg:QI x)) (const_int 0xff))
into (zero_extend:HI (reg:QI x)), but this currently only handles
CONST_INT not CONST_WIDE_INT. Fixed by upgrading this transformation
to use wide_int, specifically rtx_mode_t and wi::mask.

2023-05-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* simplify-rtx.cc (simplify_binary_operation_1) <AND>: Use wide-int
instead of HWI_COMPUTABLE_MODE_P and UINTVAL in transformation of
(and (extend X) C) as (zero_extend (and X C)), to also optimize
modes wider than HOST_WIDE_INT.

PR target/107172: Avoid "unusual" MODE_CC comparisons in simplify-rtx.cc

I believe that a better (or supplementary) fix to PR target/107172 is to
avoid producing incorrect (but valid) RTL in
simplify_const_relational_operation when presented with questionable
(obviously invalid) expressions, such as those produced during combine.
Just as with the "first do no harm" clause with the Hippocratic Oath,
simplify-rtx (probably) shouldn't unintentionally transform invalid RTL
expressions, into incorrect (non-equivalent) but valid RTL that may be
inappropriately recognized by recog.

In this specific case, many GCC backends represent their flags register via
MODE_CC, whose representation is intentionally "opaque" to the middle-end.
The only use of MODE_CC comprehensible to the middle-end's RTL optimizers
is relational comparisons between the result of a COMPARE rtx (op0) and zero
(op1). Any other uses of MODE_CC should be left alone, and some might argue
indicate representational issues in the backend.

In practice, CPUs occasionally have numerous instructions that affect the
flags register(s) other than comparisons [AVR's setc, powerpc's mtcrf,
x86's clc, stc and cmc and x86_64's ptest that sets C and Z flags in
non-obvious ways, c.f. PR target/109973]. Currently care has to be taken,
wrapping these in UNSPEC, to avoid combine inappropriately merging flags
setters with flags consumers (such as conditional jumps). It's safer to
teach simplify_const_relational_operation not to modify expressions that
it doesn't understand/recognize.

2023-05-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/107172
* simplify-rtx.cc (simplify_const_relational_operation): Return
early if we have a MODE_CC comparison that isn't a COMPARE against
const0_rtx.

OpenMP: Improve C/C++ parsing error message [PR109999]

Replace
error: expected '#pragma omp' clause before ...
by the the more readable/clearer
error: expected an OpenMP clause before ...

(And likewise for '#pragma acc' and OpenACC.)

PR c/109999

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_all_clauses,
c_parser_omp_all_clauses): Improve error wording.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_all_clauses,
cp_parser_omp_all_clauses): Improve error wording.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/asyncwait-1.c: Update dg-error.
* c-c++-common/goacc/clauses-fail.c: Likewise.
* c-c++-common/goacc/data-2.c: Likewise.
* c-c++-common/gomp/declare-target-2.c: Likewise.
* c-c++-common/gomp/directive-1.c: Likewise.
* g++.dg/goacc/data-1.C: Likewise.

RISC-V: Allow all const_vec_duplicates as constants.

As we can always broadcast an integer constant to a vector register
allow them in riscv_const_insns. We need as many instructions as
it takes to generate the constant and one vmv.vx.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Allow
const_vec_duplicates.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Add vmv.v.x
tests.
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Dito.

Detect bswap + rotate for byte permutation in pass_bswap.

The patch doesn't handle:
1. cast64_to_32,
2. memory source with rsize < range.

gcc/ChangeLog:

PR middle-end/108938
* gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New
function, cut from original find_bswap_or_nop function.
(find_bswap_or_nop): Add a new parameter, detect bswap +
rotate and save rotate result in the new parameter.
(bswap_replace): Add a new parameter to indicate rotate and
generate rotate stmt if needed.
(maybe_optimize_vector_constructor): Adjust for new rotate
parameter in the upper 2 functions.
(pass_optimize_bswap::execute): Ditto.
(imm_store_chain_info::output_merged_store): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108938-1.c: New test.
* gcc.target/i386/pr108938-2.c: New test.
* gcc.target/i386/pr108938-3.c: New test.
* gcc.target/i386/pr108938-load-1.c: New test.
* gcc.target/i386/pr108938-load-2.c: New test.

aarch64: Convert ADDLP and ADALP patterns to standard RTL codes

This patch converts the patterns for the integer widen and pairwise-add instructions
to standard RTL operations. The pairwise addition withing a vector can be represented
as an addition of two vec_selects, one selecting the even elements, and one selecting odd.
Thus for the intrinsic vpaddlq_s8 we can generate:
(set (reg:V8HI 92)
 (plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
 (parallel [
 (const_int 0 [0])
 (const_int 2 [0x2])
 (const_int 4 [0x4])
 (const_int 6 [0x6])
 (const_int 8 [0x8])
 (const_int 10 [0xa])
 (const_int 12 [0xc])
 (const_int 14 [0xe])
 ]))
 (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
 (parallel [
 (const_int 1 [0x1])
 (const_int 3 [0x3])
 (const_int 5 [0x5])
 (const_int 7 [0x7])
 (const_int 9 [0x9])
 (const_int 11 [0xb])
 (const_int 13 [0xd])
 (const_int 15 [0xf])
 ]))))

Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation.
We already have the handy helper functions aarch64_stepped_int_parallel_p and
aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define
the right predicate for the VEC_SELECT PARALLEL.

This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP.
UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete.
(aarch64_<su>adalp<mode>): New define_expand.
(*aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_<su>addlp<mode>): Convert to define_expand.
(*aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn.
* config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete.
(ADALP): Likewise.
(USADDLP): Likewise.
* config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.

aarch64: Reimplement v(r)hadd and vhsub intrinsics with RTL codes

This patch reimplements the MD patterns for the UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using
standard RTL operations rather than unspecs. The correct RTL representations involves widening
the inputs before adding them and halving, followed by a truncation back to the original mode.
An unfortunate wart in the patch is that we end up having very similar expanders for the intrinsics
through the aarch64_<su>h<ADDSUB:optab><mode> and aarch64_<su>rhadd<mode> names and the standard names
for the vector averaging optabs <su>avg<mode>3_floor and <su>avg<mode>3_ceil.
I'd like to reuse <su>avg<mode>3_ceil for the intrinsics builtin as well but our scheme
in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only allowing mappings
of entries in aarch64-simd-builtins.def to:
 0 - CODE_FOR_aarch64_<name><mode>
 1-9 - CODE_FOR_<name><mode><1-9>
 10 - CODE_FOR_<name><mode>

whereas here we want a string after the <mode> i.e. CODE_FOR_uavg<mode>3_ceil.
This patch adds a bit of remapping logic in aarch64-builtins.cc before the construction of the
builtin info that remaps the CODE_FOR_* definitions in aarch64-simd-builtins.def to the
optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to CODE_FOR_avgv4si3_ceil, for example.
It's a bit specific to this case, but this solution requires the least invasive changes while avoiding
having duplicate expanders just for the sake of a different pattern name.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of
aarch64-builtin-iterators.h. Add definition to remap shadd, uhadd,
srhadd, urhadd builtin codes for standard optab ones.
* config/aarch64/aarch64-simd.md (avg<mode>3_floor): Rename to...
(<su_optab>avg<mode>3_floor): ... This. Expand to RTL codes rather than
unspec.
(avg<mode>3_ceil): Rename to...
(<su_optab>avg<mode>3_ceil): ... This. Expand to RTL codes rather than
unspec.
(aarch64_<su>hsub<mode>): New define_expand.
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): Split into...
(*aarch64_<su>h<ADDSUB:optab><mode><vczle><vczbe>_insn): ... This...
(*aarch64_<su>rhadd<mode><vczle><vczbe>_insn): ... And this.

riscv: add work around for PR sanitizer/82501

gcc/testsuite/
PR sanitizer/82501
* c-c++-common/asan/pointer-compare-1.c: Disable use of small data
on RISC-V.

riscv: update riscv_asan_shadow_offset

gcc/
PR target/110036
* config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
match libsanitizer.

stor-layout, aarch64: Express SRA intrinsics with RTL codes

This patch expresses the intrinsics for the SRA and RSRA instructions with
standard RTL codes rather than relying on UNSPECs.
These instructions perform a vector shift right plus accumulate with an
optional rounding constant addition for the RSRA variant.
There are a number of interesting points:

* The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
kinds of trouble as it started matching the existing define_insns for
"add x0, x0, asr #N" instructions and adding the SRA form as an extra
alternative required a significant amount of deduplication of iterators and
things still didn't work out well. I decided not to tackle that case in
this patch. It can be attempted later.

* For the RSRA variants that add a rounding constant (1 << (shift-1)) the
addition is notionally performed in a wider mode than the input types so that
overflow is handled properly. In RTL this can be represented with an appropriate
extend operation followed by a truncate back to the original modes.
However for 128-bit input modes such as V4SI we don't have appropriate modes
defined for this widening i.e. we'd need a V4DI mode to represent the
intermediate widened result. This patch defines such modes for
V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
more Advanced SIMD instruction that have similar intermediate widening
semantics.

* The above new modes led to a problem with stor-layout.cc. The new modes only
exist for the sake of the RTL optimisers understanding the semantics of the
instruction but are not indended to be moved to and from register or memory,
assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
for these new modes. However, the code in stor-layout.cc:<mode_for_vector>
explicitly doesn't check this when picking a TYPE_MODE due to modes being made
potentially available later through target switching (PR38240).
This led to these modes being picked as TYPE_MODE for declarations such as:
typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
fixed-length SVE modes are available and vector_type_mode later struggling
to rectify this.
This issue is addressed with the new target hook
TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
vector mode can be used in any legal target attribute configuration of the
port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
only the initial target configuration. This allows a simple adjustment in
stor-layout.cc that still disqualifies these limited modes early on while
allowing consideration of modes that can be turned on in the future with
target attributes.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes.
* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
Declare prototype.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>): Rename to...
(aarch64_<sra_op>sra_n<mode>_insn): ... This.
(aarch64_<sra_op>rsra_n<mode>_insn): New define_insn.
(aarch64_<sra_op>sra_n<mode>): New define_expand.
(aarch64_<sra_op>rsra_n<mode>): Likewise.
(aarch64_<sur>sra_n<mode>): Rename to...
(aarch64_<sur>sra_ndi): ... This.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
any_target_p argument.
(aarch64_extract_vec_duplicate_wide_int): Define.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
(aarch64_const_vec_rnd_cst_p): Likewise.
(aarch64_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete.
(VSRA): Adjust for the above.
(sur): Likewise.
(V2XWIDE): New mode_attr.
(vec_or_offset): Likewise.
(SHIFTEXTEND): Likewise.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New
predicate.
* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
* doc/tm.texi.in: Regenerate.
* stor-layout.cc (mode_for_vector): Check
vector_mode_supported_any_target_p when iterating through vector modes.
* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.

ada: Fix wrong access for qualified aggregate with storage model

The previous fix to get_storage_model_access was incomplete and needs to be
extended to the node itself.

gcc/ada/

* gcc-interface/trans.cc (get_storage_model_access): Also strip any
type conversion in the node when unwinding the components.

ada: Fix internal error on qualified aggregate with storage model

It comes from a small oversight in get_storage_model_access.

gcc/ada/

* gcc-interface/trans.cc (node_is_component): Remove parentheses.
(node_is_type_conversion): New predicate.
(get_atomic_access): Use it.
(get_storage_model_access): Likewise and look into the parent to
find a component if it returns true.
(present_in_lhs_or_actual_p): Likewise.

ada: Add missing guards for degenerate storage models

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Check that
the storage model has Copy_From before instantiating loads for it.
<Attr_Length>: Likewise.
<Attr_Bit_Position>: Likewise.
(gnat_to_gnu) <N_Indexed_Component>: Likewise.
<N_Slice>: Likewise.

ada: Fix incorrect copies being used with 'Address

When using 'Address on an object with a size clause, gigi would end up
creating a copy and using its address instead of the one of the original
object, leading to incorrect behavior. Remove the conversion (that
triggers the copy) when 'Address is applied to a declaration.

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu): Also strip conversion
in case of DECL.

ada: Fix bogus Storage_Error on dynamic array with static zero length

This works around the limitations present for the support of arrays in the
middle-end by clearing the TREE_OVERFLOW flag for arrays with zero length.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Array_Type>: Use a
local variable for the GNAT index type.
<E_Array_Subtype>: Likewise. Call Is_Null_Range on the bounds and
force the zero on TYPE_SIZE and TYPE_SIZE_UNIT if it returns true.

ada: Fix minor issue with Mod operator

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu) <N_Op_Mod>: Test the
precision of the operation rather than that of the result type.

ada: Minor generic tweaks left and and right

No functional changes.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Replace
integer_zero_node with null_pointer_node for pointer types.
* gcc-interface/trans.cc (gnat_gimplify_expr) <NULL_EXPR>: Likewise.
* gcc-interface/utils.cc (maybe_pad_type): Do not attempt to make a
packable type from a fat pointer type.
* gcc-interface/utils2.cc (build_atomic_load): Use a local variable.
(build_atomic_store): Likewise.

ada: Make internal_error_function more robust

gcc/ada/

* gcc-interface/misc.cc (internal_error_function): Be prepared for
an input_location set to UNKNOWN_LOCATION.

ada: Adjust again the implementation of storage models

The code generator must now be prepared to translate assignment statements
to objects allocated with a storage model and that are not initialized yet.

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Tweak.
(gnat_to_gnu) <N_Assignment_Statement>: Declare a local variable.
For a target with a storage model, use the Actual_Designated_Subtype
to compute the size if it is present.

ada: Simplify the implementation of storage models

As the additional temporaries required by the semantics of nonnative storage
models are now created by the front-end, in particular for actual parameters
and assignment statements, the corresponding code in gigi can be removed.

gcc/ada/

* gcc-interface/trans.cc (Call_to_gnu): Remove code implementing the
by-copy semantics for actuals with nonnative storage models.
(gnat_to_gnu) <N_Assignment_Statement>: Remove code instantiating a
temporary for assignments between nonnative storage models.

ada: Make use of Cannot_Be_Superflat flag on N_Range nodes

gcc/ada/

* gcc-interface/decl.cc (range_cannot_be_superflat): Return true
immediately if Cannot_Be_Superflat is set.
* gcc-interface/misc.cc (gnat_post_options): Do not override the
-Wstringop-overflow setting.

ada: Disable PIE mode during the build of the Ada front-end

This also removes some obsolete stuff.

gcc/ada/

* gcc-interface/Make-lang.in (ADA_CFLAGS): Move up.
(ALL_ADAFLAGS): Add $(NO_PIE_CFLAGS).
(ada/mdll.o): Remove.
(ada/mdll-fil.o): Likewise.
(ada/mdll-utl.o): Likewise.

ada: Fix storage model handling for dereference as lvalue and renamings

Don't require storage access for explicit dereferences used as
lvalue (e.g. Some_Access.all'Address) or for renamings.

gcc/ada/

* gcc-interface/trans.cc (get_storage_model_access): Don't require
storage model access for dereference used as lvalue or renamings.

ada: Small cleanups and fixes in expansion of aggregates

This streamlines the handling of qualified expressions in the expansion of
aggregates and plugs a couple of loopholes that may cause memory leaks.

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code): Move the declaration of Typ
to the beginning.
(Initialize_Array_Component): Test the unqualified version of the
expression for the nested array case.
(Initialize_Ctrl_Array_Component): Do not duplicate the expression
here. Do the pattern matching of the unqualified version of it.
(Gen_Assign): Call Unqualify to compute Expr_Q and use Expr_Q in
subsequent pattern matching.
(Initialize_Ctrl_Record_Component): Do the pattern matching of the
unqualified version of the aggregate.
(Build_Record_Aggr_Code): Call Unqualify.
(Convert_Aggr_In_Assignment): Likewise.
(Convert_Aggr_In_Object_Decl): Likewise.
(Component_OK_For_Backend): Likewise.
(Is_Delayed_Aggregate): Likewise.

ada: Fix wrong expansion of array aggregate with noncontiguous choices

This extends an earlier fix done for the others choice of an array aggregate
to all the choices of the aggregate, since the same sharing issue may happen
when the choices are not contiguous.

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code.Get_Assoc_Expr): Duplicate the
expression here instead of...
(Build_Array_Aggr_Code): ...here.

ada: Fix internal error on array constant in expression function

This happens when the peculiar check emitted by Check_Large_Modular_Array
is applied to an object whose actual subtype is an itype with dynamic size,
because the first reference to the itype in the expanded code may turn out
to be within the raise statement, which is problematic for the eloboration
of this itype by the code generator at library level.

gcc/ada/

* freeze.adb (Check_Large_Modular_Array): Fix head comment, use
Standard_Long_Long_Integer_Size directly and generate a reference
just before the raise statement if the Etype of the object is an
itype declared in an open scope.

ada: Fix fallout of recent fix for missing finalization

The original fix makes it possible to create transient scopes around return
statements in more cases, but it overlooks that transient scopes are reused
and, in particular, that they can be promoted to secondary stack management.

gcc/ada/

* exp_ch7.adb (Find_Enclosing_Transient_Scope): Return the index in
the scope table instead of the scope's entity.
(Establish_Transient_Scope): If an enclosing scope already exists,
do not set the Uses_Sec_Stack flag on it if the node to be wrapped
is a return statement which requires secondary stack management.

ada: Add System.Traceback.Symbolic.Module_Name support on AArch64 Linux

This commit changes the runtime on aarch64-linux to use the Linux
version of s-tsmona.adb, so as to add support for this functionality
on aarch64-linux.

gcc/ada/

* Makefile.rtl: Use libgnat/s-tsmona__linux.adb on
aarch64-linux. Link libgnat with -ldl, as the use of
s-tsmona__linux.adb requires it.

ada: Only build access-to-subprogram wrappers when expander is active

For access-to-subprogram types with Pre/Post aspects we create a wrapper
routine that evaluates these aspects. Spec of this wrapper was created
always, while its body was only created when expansion was enabled.

Now we only create these wrappers when expansion is enabled. In
particular, we don't create them in GNATprove mode; instead, GNATprove
picks the Pre/Post expressions directly from the aspects.

gcc/ada/

* exp_ch3.adb
(Build_Access_Subprogram_Wrapper_Body): Build wrapper body if requested
by routine that builds wrapper spec.
* sem_ch3.adb
(Analyze_Full_Type_Declaration): Only build wrapper when expander is
active.
(Build_Access_Subprogram_Wrapper):
Remove special-case for GNATprove.

ada: Fix minor issues in user's guide

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor issues.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Fix minor issues.
* gnat_ugn.texi: Regenerate.

ada: Ensure Default_Stack_Size is greater than Minimum_Stack_Size

The Default_Stack_Size function does not check that the binder specified
default stack size is greater than the minimum stack size for the runtime.
This can result in tasks using default stack sizes less than the minimum
stack size because the Adjust_Storage_Size only adjusts storages sizes for
tasks that explicitly specify a storage size. To avoid this, the binder
specified default stack size is round up to the minimum stack size if
required.

gcc/ada/

* libgnat/s-parame.adb: Check that Default_Stack_Size >=
Minimum_Stack_size.
* libgnat/s-parame__rtems.adb: Ditto.
* libgnat/s-parame__vxworks.adb: Check that Default_Stack_Size >=
Minimum_Stack_size and use the proper Minimum_Stack_Size if
Stack_Check_Limits is enabled.

ada: Fix regression of secondary stack management in return statements

This happens when the expression of the return statement is a call that does
not return on the same stack as the enclosing function.

gcc/ada/

* sem_res.adb (Resolve_Call): Restrict previous change to calls that
return on the same stack as the enclosing function. Tidy up.

ada: Use generalized loop iteration in Put_Image routines

gcc/ada/

* libgnat/a-cidlli.adb (Put_Image): Simplify.
* libgnat/a-coinve.adb (Put_Image): Likewise.

ada: Fix visibility error with DIC or Type_Invariant aspect on generic type

The compiler fails to capture global references during the analysis of the
aspect on the generic type because it analyzes a copy of the expression.

gcc/ada/

* exp_util.adb (Build_DIC_Procedure_Body.Add_Own_DIC): When inside
a generic unit, preanalyze the expression directly.
(Build_Invariant_Procedure_Body.Add_Own_Invariants): Likewise.

ada: Fix coding style in init.c

The coding style rules require to avoid using FIXME comments. ??? is
preferred.

gcc/ada/

* init.c: Replace FIXME by ???

Handle FMA friendly in reassoc pass

Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.

There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to generating more fma and reducing the loss of FMA when breaking
the chain.
2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
chains according to the given correlation width, keeping the FMA chance as
much as possible.

With the patch applied

On ICX:
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r : Improved by 0.60% for single copy .
507.cactuBSSN_r: Improved by 1.10% for single copy .
519.lbm_r : Improved by 2.21% for single copy .
no measurable changes for other benchmarks.

On aarch64
507.cactuBSSN_r: Improved by 1.7% for multi-copy.
503.bwaves_r : Improved by 6.00% for single-copy.
no measurable changes for other benchmarks.

TEST1:

float
foo (float a, float b, float c, float d, float *e)
{
 return *e + a * b + c * d ;
}

For "-Ofast -mfpmath=sse -mfma" GCC generates:
 vmulss %xmm3, %xmm2, %xmm2
 vfmadd132ss %xmm1, %xmm2, %xmm0
 vaddss (%rdi), %xmm0, %xmm0
 ret

With this patch GCC generates:
 vfmadd213ss (%rdi), %xmm1, %xmm0
 vfmadd231ss %xmm2, %xmm3, %xmm0
 ret

TEST2:

for (int i = 0; i < N; i++)
{
 a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + m[i]* o[i] + p[i];
}

For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmovapd e(%rax), %ymm4
vmulpd d(%rax), %ymm4, %ymm3
addq $32, %rax
vmovapd c-32(%rax), %ymm5
vmovapd j-32(%rax), %ymm6
vmulpd h-32(%rax), %ymm6, %ymm2
vmovapd a-32(%rax), %ymm6
vaddpd p-32(%rax), %ymm6, %ymm0
vmovapd g-32(%rax), %ymm7
vfmadd231pd b-32(%rax), %ymm5, %ymm3
vmovapd o-32(%rax), %ymm4
vmulpd m-32(%rax), %ymm4, %ymm1
vmovapd l-32(%rax), %ymm5
vfmadd231pd f-32(%rax), %ymm7, %ymm2
vfmadd231pd k-32(%rax), %ymm5, %ymm1
vaddpd %ymm3, %ymm0, %ymm0
vaddpd %ymm2, %ymm0, %ymm0
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L4
vzeroupper
ret

with this patch applied GCC breaks the chain with width = 2 and generates 6 fma:

vmovapd a(%rax), %ymm2
vmovapd c(%rax), %ymm0
addq $32, %rax
vmovapd e-32(%rax), %ymm1
vmovapd p-32(%rax), %ymm5
vmovapd g-32(%rax), %ymm3
vmovapd j-32(%rax), %ymm6
vmovapd l-32(%rax), %ymm4
vmovapd o-32(%rax), %ymm7
vfmadd132pd b-32(%rax), %ymm2, %ymm0
vfmadd132pd d-32(%rax), %ymm5, %ymm1
vfmadd231pd f-32(%rax), %ymm3, %ymm0
vfmadd231pd h-32(%rax), %ymm6, %ymm1
vfmadd231pd k-32(%rax), %ymm4, %ymm0
vfmadd231pd m-32(%rax), %ymm7, %ymm1
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L2
vzeroupper
ret

gcc/ChangeLog:

PR tree-optimization/98350
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Rewrite this function.
(rank_ops_for_fma): New.
(reassociate_bb): Handle new function.

gcc/testsuite/ChangeLog:

PR tree-optimization/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.

rtlanal: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* rtl.h (rtx_addr_can_trap_p): Change return type from int to bool.
(rtx_unstable_p): Ditto.
(reg_mentioned_p): Ditto.
(reg_referenced_p): Ditto.
(reg_used_between_p): Ditto.
(reg_set_between_p): Ditto.
(modified_between_p): Ditto.
(no_labels_between_p): Ditto.
(modified_in_p): Ditto.
(reg_set_p): Ditto.
(multiple_sets): Ditto.
(set_noop_p): Ditto.
(noop_move_p): Ditto.
(reg_overlap_mentioned_p): Ditto.
(dead_or_set_p): Ditto.
(dead_or_set_regno_p): Ditto.
(find_reg_fusage): Ditto.
(find_regno_fusage): Ditto.
(side_effects_p): Ditto.
(volatile_refs_p): Ditto.
(volatile_insn_p): Ditto.
(may_trap_p_1): Ditto.
(may_trap_p): Ditto.
(may_trap_or_fault_p): Ditto.
(computed_jump_p): Ditto.
(auto_inc_p): Ditto.
(loc_mentioned_in_p): Ditto.
* rtlanal.cc (computed_jump_p_1): Adjust forward declaration.
(rtx_unstable_p): Change return type from int to bool
and adjust function body accordingly.
(rtx_addr_can_trap_p): Ditto.
(reg_mentioned_p): Ditto.
(no_labels_between_p): Ditto.
(reg_used_between_p): Ditto.
(reg_referenced_p): Ditto.
(reg_set_between_p): Ditto.
(reg_set_p): Ditto.
(modified_between_p): Ditto.
(modified_in_p): Ditto.
(multiple_sets): Ditto.
(set_noop_p): Ditto.
(noop_move_p): Ditto.
(reg_overlap_mentioned_p): Ditto.
(dead_or_set_p): Ditto.
(dead_or_set_regno_p): Ditto.
(find_reg_fusage): Ditto.
(find_regno_fusage): Ditto.
(remove_node_from_insn_list): Ditto.
(volatile_insn_p): Ditto.
(volatile_refs_p): Ditto.
(side_effects_p): Ditto.
(may_trap_p_1): Ditto.
(may_trap_p): Ditto.
(may_trap_or_fault_p): Ditto.
(computed_jump_p): Ditto.
(auto_inc_p): Ditto.
(loc_mentioned_in_p): Ditto.
* combine.cc (can_combine_p): Update indirect function.

RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

Even though we can't support floating-point operations which are
depending
on FRM yet, (for example vfadd support is blocked) since the RVV
intrinsic doc is not updated
and we can't support mode switching for this.

We can support floating-point to integer conversion now since it's not
depending on FRM and
we don't need mode switching support for this ('rtz' conversions
independent FRM).

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/autovec.md (<optab><mode><vconvert>2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test.

RISC-V: Fix warning in riscv.md

Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
 if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
 else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md: In function â€˜rtx_def*
gen_anddi3(rtx, rtx, rtx)â€™:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
 if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
 else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))

Add unsigned conversion to fix this warning.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/riscv.md: Fix signed and unsigned comparison
warning.

RISC-V: Add RVV FNMA auto-vectorization support

Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/autovec.md (fnma<mode>4): New pattern.
(*fnma<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.

Daily bump.

RISC-V: Optimize TARGET_XTHEADCONDMOV

This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is enabled.

Provide an example from the existing testcases.

Testcase:
int ConEmv_imm_imm_reg(int x, int y){
 if (x == 1000) return 10;
 return y;
}

Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d

before patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,zero,a5
th.mveqz a1,zero,a5
or a0,a0,a1
ret

after patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,a1,a5
ret

Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move_onesided):
Delete.
(riscv_expand_conditional_move): Reuse the TARGET_SFB_ALU expand
process for TARGET_XTHEADCONDMOV

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.

i386: Also require TARGET_AVX512BW to generate truncv16hiv16qi2 [PR110021]

gcc/ChangeLog:

PR target/110021
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require
TARGET_AVX512BW to generate truncv16hiv16qi2.