git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

RISC-V: Support const vector expansion with step vector with base != 0

Currently, we are able to generate step vector with base == 0:
{ 0, 0, 2, 2, 4, 4, ... }

ASM:

vid
vand

However, we do wrong for step vector with base != 0:
{ 1, 1, 3, 3, 5, 5, ... }

Before this patch, such case will run fail.

After this patch, we are able to pass the testcase and generate the step vector with asm:

vid
vand
vadd

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix stepped vector
with base != 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-19.c: New test.

docs: Add @cindex for some attributes

While looking for the access attribute,
I tried to find it via the concept index but it was
missing. This patch fixes that and adds one for
interrupt/interrupt_handler too.

Committed as obvious after building the HTML docs
and looking at the resulting concept index page.

gcc/ChangeLog:

* doc/extend.texi (access attribute): Add
cindex for it.
(interrupt/interrupt_handler attribute):
Likewise.

libstdc++: Synchronize PSTL with upstream

This patch rebases the C++17 parallel algorithms implementation (pstl)
against the current upstream version, commit 843c12d6a.

This version does not currently include the recently added OpenMP
backend, that will be considered for a future version.

libstdc++-v3/ChangeLog:
* include/pstl/algorithm_fwd.h: Synchronize with upstream.
* include/pstl/algorithm_impl.h: Likewise.
* include/pstl/execution_defs.h: Likewise.
* include/pstl/execution_impl.h: Likewise.
* include/pstl/glue_algorithm_impl.h: Likewise.
* include/pstl/glue_execution_defs.h: Likewise.
* include/pstl/glue_memory_impl.h: Likewise.
* include/pstl/glue_numeric_impl.h: Likewise.
* include/pstl/memory_impl.h: Likewise.
* include/pstl/numeric_fwd.h: Likewise.
* include/pstl/numeric_impl.h: Likewise.
* include/pstl/parallel_backend.h: Likewise.
* include/pstl/parallel_backend_serial.h: Likewise.
* include/pstl/parallel_backend_tbb.h: Likewise.
* include/pstl/parallel_impl.h: Likewise.
* include/pstl/pstl_config.h: Likewise.
* include/pstl/unseq_backend_simd.h: Likewise.
* include/pstl/utils.h: Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_construct.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_copy_move.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/pstl/uninitialized_fill_destroy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/inplace_merge.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_merge/merge.cc: Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/copy_if.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/copy_move.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/fill.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/generate.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/is_partitioned.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/partition.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/partition_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/remove.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/remove_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/replace.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/replace_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/swap_ranges.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_unary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/unique.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_modifying_operations/unique_copy_equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/adjacent_find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/all_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/any_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/count.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/equal.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_end.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_first_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_if.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/for_each.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/none_of.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/reverse_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/search_n.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_heap.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/is_sorted.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort_copy.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/sort.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/adjacent_difference.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/reduce.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/scan.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/transform_reduce.cc:
Likewise.
* testsuite/26_numerics/pstl/numeric_ops/transform_scan.cc:
Likewise.
* testsuite/util/pstl/test_utils.h:
Likewise.

compiler: support -fgo-importcfg

* lang.opt (fgo-importcfg): New option.
* go-c.h (struct go_create_gogo_args): Add importcfg field.
* go-lang.cc (go_importcfg): New static variable.
(go_langhook_init): Set args.importcfg.
(go_langhook_handle_option): Handle -fgo-importcfg.
* gccgo.texi (Invoking gccgo): Document -fgo-importcfg.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/506095

aarch64: Use <DWI> instead of <V2XWIDE> in scalar SQRSHRUN pattern

In the scalar pattern for SQRSHRUN it's a bit clearer to use DWI instead of V2XWIDE
to make it more clear that no vector modes are involved.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqrshrun_n<mode>_insn):
Use <DWI> instead of <V2XWIDE>.
(aarch64_sqrshrun_n<mode>): Likewise.

aarch64: Clean up some rounding immediate predicates

aarch64_simd_rsra_rnd_imm_vec is now used for more than just RSRA
and accepts more than just vectors so rename it to make it more
truthful.
The aarch64_simd_rshrn_imm_vec is now unused and can be deleted.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec):
Rename to...
(aarch64_int_rnd_operand): ... This.
(aarch64_simd_rshrn_imm_vec): Delete.
* config/aarch64/aarch64-simd.md (aarch64_<sra_op>rsra_n<mode>_insn):
Adjust for the above.
(aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): Likewise.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise.
(*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_le): Likewise.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_be): Likewise.
(aarch64_sqrshrun2_n<mode>_insn_le): Likewise.
(aarch64_sqrshrun2_n<mode>_insn_be): Likewise.
* config/aarch64/aarch64.cc (aarch64_const_vec_rsra_rnd_imm_p):
Rename to...
(aarch64_rnd_imm_p): ... This.

libstdc++: Fix std::format for pointers [PR110239]

The formatter for pointers was casting to uint64_t which sign extends a
32-bit pointer and produces a value that won't fit in the provided
buffer. Cast to uintptr_t instead.

There was also a bug in the __parse_integer helper when converting a
wide string to a narrow string in order to use std::from_chars on it.
The function would always try to read 32 characters, even if the format
string was shorter than that. Fix that bug, and remove the constexpr
implementation of __parse_integer by just using __from_chars_alnum
instead of from_chars, because that's usable in constexpr even in
C++20.

libstdc++-v3/ChangeLog:

PR libstdc++/110239
* include/std/format (__format::__parse_integer): Fix buffer
overflow for wide chars.
(formatter<const void*, C>::format): Cast to uintptr_t instead
of uint64_t.
* testsuite/std/format/string.cc: Test too-large widths.

libstdc++: Implement P2538R1 ADL-proof std::projected

This was recently approved for C++26, but there's no harm in
implementing it unconditionally for C++20 and C++23. As it says in the
paper, it doesn't change the meaning of any valid code. It only enables
things that were previously ill-formed for questionable reasons.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (projected): Replace class
template with alias template denoting an ADL-proofed helper.
(incremental_traits<projected<Iter, Proj>>): Remove.
* testsuite/24_iterators/indirect_callable/projected-adl.cc:
New test.

libstdc++: Qualify calls to debug mode helpers

These functions should be qualified to disable unwanted ADL.

The overload of __check_singular_aux for safe iterators was previously
being found by ADL, because it wasn't declared before __check_singular.
Add a declaration so that it can be found by qualified lookup.

libstdc++-v3/ChangeLog:

* include/debug/helper_functions.h (__get_distance)
(__check_singular, __valid_range_aux, __valid_range): Qualify
calls to disable ADL.
(__check_singular_aux(const _Safe_iterator_base*)): Declare
overload that was previously found via ADL.

IBM zSystems: Assume symbols without explicit alignment to be ok

A change we have committed back in 2015 relies on the backend
requested ABI alignment to be applied to ALL symbols by the
middle-end. However, this does not appear to be the case for external
symbols. With this commit we assume all symbols without explicit
alignment to be aligned according to the ABI. That's the behavior we
had before.
This fixes a performance regression caused by the 2015 patch. Since
then the address of external char type symbols have been pushed to the
literal pool, although it is safe to access them with larl (which
requires symbols to reside at even addresses).

gcc/
* config/s390/s390.cc (s390_encode_section_info): Set
SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been
misaligned.

gcc/testsuite/
* gcc.target/s390/larl-1.c: New test.

Fix profile of forwarders produced by cd-dce

compiling the testcase from PR109849 (which uses std:vector based stack to
drive a loop) with profile feedbakc leads to profile mismatches introduced by
tree-ssa-dce. This is the new code to produce unified forwarder blocks for
PHIs.

I am not including the testcase itself since
checking it for Invalid sum is probably going to be too fragile and this should
show in our LNT testers. The patch however fixes the mismatch.

Bootstrapped/regtested x86_64-linux and plan to commit it shortly.

gcc/ChangeLog:

PR tree-optimization/109849
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile
count of newly constructed forwarder block.

docs: Fix typo

gcc/ChangeLog:

* doc/optinfo.texi: Fix "steam" -> "stream".

DSE: Add LEN_MASK_STORE analysis into DSE and fix LEN_STORE

Hi, Richi.

This patch is adding LEN_MASK_STORE into DSE.

My understanding is LEN_MASK_STORE is predicated by mask and len.
No matter len is constant or not, the ao_ref should be the same as MASK_STORE.

Wheras for LEN_STORE, when len is constant, we use (len - bias), otherwise, it's
the same as MASK_STORE/LEN_MASK_STORE.

Not sure whether I am on the same page with you, feel free to correct me.

Thanks.

gcc/ChangeLog:

* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE and
fix LEN_STORE.
(dse_optimize_stmt): Add LEN_MASK_STORE.

GIMPLE_FOLD: Fix gimple fold for LEN_{MASK}_{LOAD,STORE}

Hi, previous I made a mistake on GIMPLE_FOLD of LEN_MASK_{LOAD,STORE}.

We should fold LEN_MASK_{LOAD,STORE} (bias+len) == vf (nunits instead of bytesize) && mask = all trues mask

into:
MEM_REF [...].

This patch added testcase to test gimple fold of LEN_MASK_{LOAD,STORE}.

Also, I fix LEN_LOAD/LEN_STORE, to make them have the same behavior.

Ok for trunk ?

gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Fix gimple
fold of LOAD/STORE with length.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: New test.

Avoid redundant GORI calcuations.

When GORI evaluates a statement, if operand 1 and 2 are both in the
dependency chain, GORI evaluates the name through both operands sequentially
and combines the results.

If either operand is in the dependency chain of the other, this
evaluation will do the same work twice, for questionable gain.
Instead, simple evaluate only the operand which depends on the other
and keep the evaluation linear in time.

* gimple-range-gori.cc (compute_operand1_and_operand2_range):
Check for interdependence between operands 1 and 2.

vect: Cost intermediate conversions

g:6f19cf7526168f8 extended N-vector to N-vector conversions
to handle cases where an intermediate integer extension or
truncation is needed. This patch adjusts the cost to account
for these intermediate conversions.

gcc/
* tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt
into account when costing non-widening/truncating conversions.

tree-optimization/110381 - preserve SLP permutation with in-order reductions

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.

* gcc.dg/vect/pr110381.c: New testcase.

RISC-V: Remove duplicated extern function_base decl

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.

narrowing initializers and initializer_constant_valid_p_1

initializer_constant_valid_p_1 attempts to handle narrowing
differences and sums but fails to handle when the overall
value looks like

  VIEW_CONVERT_EXPR<long long int>(NON_LVALUE_EXPR <v>
    -  VEC_COND_EXPR < { 0, 0 } == { 0, 0 } , { -1, -1 } , { 0, 0 } > )

where endtype is scalar integer but value is a vector type.
In this particular case all is good and we recurse since
two vector lanes is more than 64bits of long long.  But still
it compares apples and oranges.

Fixed by appropriately also requiring the type of the
value to be scalar integral.

* varasm.cc (initializer_constant_valid_p_1): Also
constrain the type of value to be scalar integral
before dispatching to narrowing_initializer_constant_valid_p.

Avoid shorten_binary_op on VECTOR_TYPE

When we disallow TYPE_PRECISION on VECTOR_TYPEs it shows that
shorten_binary_op performs some checks on that that are likely
harmless in the end. The following bails out early for
VECTOR_TYPE operations to avoid those questionable checks.

gcc/c-family/
* c-common.cc (shorten_binary_op): Exit early for VECTOR_TYPE
operations.

Fix TYPE_PRECISION use in hashable_expr_equal_p

While the checks look unnecessary they probably are quick and
thus done early. The following avoids using TYPE_PRECISION
on VECTOR_TYPEs by making the code match the comment which
talks about precision and signedness. An alternative would
be to only retain the ERROR_MARK and TYPE_MODE checks or
use TYPE_PRECISION_RAW (but I like that least).

* tree-ssa-scopedtables.cc (hashable_expr_equal_p):
Use element_precision.

RISC-V: Remove redundant vcond patterns

Previously, Richi has suggested that vcond patterns are only needed when target
support comparison + select consuming 1 instruction.

Now, I do the experiments on removing those "vcond" patterns, it works perfectly.

All testcases PASS.

Really appreicate Richi helps us recognize such issue.

Now remove all "vcond" patterns as Richi suggested.

gcc/ChangeLog:

* config/riscv/autovec.md (vcond<V:mode><VI:mode>): Remove redundant
vcond patterns.
(vcondu<V:mode><VI:mode>): Ditto.
* config/riscv/riscv-protos.h (expand_vcond): Ditto.
* config/riscv/riscv-v.cc (expand_vcond): Ditto.

tree-optimization/110392 - ICE with predicate analysis

Feeding not optimized IL can result in predicate normalization
to simplify things so a predicate can get true or false. The
following re-orders the early exit in that case to come after
simplification and normalization to take care of that.

PR tree-optimization/110392
* gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded):
Do early exits on true/false predicate only after normalization.

SCCVN: Fix repeating variable name "len"

Line 3292: has variable name "len": tree mask = NULL_TREE, len = NULL_TREE, bias = NULL_TREE;
Line 3349: has variable name "len": HOST_WIDE_INT start = 0, len = 0;

Since they are never used simultaneously, such issue is not recognized for now.
However, I want to add LEN_MASK_{LOAD,STORE} which will need these 2 variables, so fix naming in this path.

Change HOST_WIDE_INT start = 0, len = 0; into HOST_WIDE_INT start = 0, length = 0;

gcc/ChangeLog:

* tree-ssa-sccvn.cc (vn_reference_lookup_3): Change name "len" into
"length".

i386: New *ashl<dwi3>_doubleword_highpart define_insn_and_split.

This patch contains a pair of (related) optimizations in i386.md that
allow us to generate better code for the example below (this is a step
towards fixing a bugzilla PR, but I've forgotten the number).

__int128 foo64(__int128 x, long long y)
{
  __int128 t = (__int128)y << 64;
  return x ^ t;
}

The hidden issue is that the RTL currently seen by reload contains
the sign extension of y from DImode to TImode, even though this is
dead (not required) for left shifts by more than WORD_SIZE bits.

(insn 11 8 12 2 (parallel [
            (set (reg:TI 0 ax [orig:91 y ] [91])
                (sign_extend:TI (reg:DI 1 dx [97])))
            (clobber (reg:CC 17 flags))
            (clobber (scratch:DI))
        ]) {extendditi2}

What makes this particularly undesirable is that the sign-extension
pattern above requires an additional DImode scratch register, indicated
by the clobber, which unnecessarily increases register pressure.

The proposed solution is to add a define_insn_and_split for such
left shifts (of sign or zero extensions) that only have a non-zero
highpart, where the extension is redundant and eliminated, that can
be split after reload, without scratch registers or early clobbers.

This (late split) exposes a second optimization opportunity where
setting the lowpart to zero can sometimes be combined/simplified with
the following instruction during peephole2.

For the test case above, we previously generated with -O2:

foo64: xorl    %eax, %eax
        xorq    %rsi, %rdx
        xorq    %rdi, %rax
        ret

with this patch, we now generate:

foo64: movq    %rdi, %rax
        xorq    %rsi, %rdx
        ret

Likewise for the related -m32 test case, we go from:

foo32:  movl    12(%esp), %eax
        movl    %eax, %edx
        xorl    %eax, %eax
        xorl    8(%esp), %edx
        xorl    4(%esp), %eax
        ret

to the improved:

foo32:  movl    12(%esp), %edx
        movl    4(%esp), %eax
        xorl    8(%esp), %edx
        ret

2023-06-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (peephole2): Simplify zeroing a register
followed by an IOR, XOR or PLUS operation on it, into a move.
(*ashl<dwi>3_doubleword_highpart): New define_insn_and_split to
eliminate (and hide from reload) unnecessary word to doubleword
extensions that are followed by left shifts by sufficiently large,
but valid, bit counts.

gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-1.c: New 32-bit test case.
* gcc.target/i386/ashlti3-2.c: New 64-bit test case.

Use cvt_op to save intermediate type operand instead of "subtle" vec_dest.

When there're multiple operands in vec_oprnds0, vec_dest will be
overwrited to vectype_out, but in multi_step_cvt case, cvt_type is
expected. It caused an ICE when verify_gimple_in_cfg.

gcc/ChangeLog:

PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use cvt_op to
save intermediate type operand instead of "subtle" vec_dest
for case NONE.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110371.c: New test.

Don't use intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.

> > Hmm, good question.  GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined.  So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following Annex F (unspecified value plus "invalid" exception
> for out-of-range floating-to-integer conversions rather than undefined
> behavior).  But we don't achieve that very well at present (see bug 93806
> comments 27-29 for examples of how such conversions produce wobbly
> values).

That would mean guarding this with !flag_trapping_math would be the appropriate
thing to do.

gcc/ChangeLog:

PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Don't use
intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110018-1.c: Add -fno-trapping-math to dg-options.
* gcc.target/i386/pr110018-2.c: Ditto.

i386: Sync tune_string with arch_string for target attribute arch=*

For function with target attribute arch=*, current logic will set its
tune to -mtune from command line so all target_clones will get same
tuning flags which would affect the performance for each clone. Override
tune with arch if tune was not explicitly specified to get proper tuning
flags for target_clones.

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_valid_target_attribute_tree):
Override tune_string with arch_string if tune_string is not
explicitly specified.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mvc17.c: New test.

RISC-V: Fix one test failure of dg config.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vlmul_ext-2.c: Add -Wno-psabi for dg.

d: Suboptimal codegen for __builtin_expect(cond, false)

Since PR96435, both boolean objects and expressions have been evaluated
in the following way.

(*(ubyte*)&obj_or_expr) & 1

It has been noted that sometimes this can cause the back-end to optimize
in non-obvious ways - in particular with __builtin_expect.

This @safe feature is now restricted to just when reading the value of a
bool field that comes from a union.

PR d/110359

gcc/d/ChangeLog:

* d-convert.cc (convert_for_rvalue): Only apply the @safe boolean
conversion to boolean fields of a union.
(convert_for_condition): Call convert_for_rvalue in the default case.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110359.d: New test.

Daily bump.

d: Merge upstream dmd, druntime a45f4e9f43, phobos 106038f2e.

D front-end changes:

- Import dmd v2.103.1.
- Deprecated invalid special token sequences inside token strings.

D runtime changes:

- Import druntime v2.103.1.

Phobos changes:

- Import phobos v2.103.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd a45f4e9f43.
* dmd/VERSION: Bump version to v2.103.1.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime a45f4e9f43.
* src/MERGE: Merge upstream phobos 106038f2e.

RISC-V: Optimize VSETVL codegen of SELECT_VL with LEN_MASK_{LOAD, STORE}

This patch is depending on LEN_MASK_{LOAD,STORE} patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622742.html

After enabling the LEN_MASK_{LOAD,STORE}, I notice that there is a case that VSETVL PASS need to be optimized:

void
f (int32_t *__restrict a,
   int32_t *__restrict b,
   int32_t *__restrict cond,
   int n)
{
  for (int i = 0; i < 8; i++)
    if (cond[i])
      a[i] = b[i];
}

Before this patch:
f:
        vsetivli        a5,8,e8,mf4,tu,mu   --> Propagate "8" to the following vsetvl
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v0,0(a2)
        vsetvli a6,zero,e32,m1,ta,ma
        li      a3,8
        vmsne.vi        v0,v0,0
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v1,0(a1),v0.t
        vse32.v v1,0(a0),v0.t
        sub     a4,a3,a5
        beq     a3,a5,.L6
        slli    a5,a5,2
        add     a2,a2,a5
        add     a1,a1,a5
        add     a0,a0,a5
        vsetvli a5,a4,e8,mf4,tu,mu     --> Propagate "a4" to the following vsetvl
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v0,0(a2)
        vsetvli a6,zero,e32,m1,ta,ma
        vmsne.vi        v0,v0,0
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v1,0(a1),v0.t
        vse32.v v1,0(a0),v0.t
.L6:
        ret

Current VSETLV PASS only enable AVL propagation of VLMAX AVL ("zero").
Now, we enable AVL propagation of immediate && conservative non-VLMAX.

After this patch:

f:
        vsetivli        a5,8,e8,mf4,ta,ma
        vle32.v v0,0(a2)
        vsetvli a6,zero,e32,m1,ta,ma
        li      a3,8
        vmsne.vi        v0,v0,0
        vsetivli        zero,8,e32,m1,ta,ma
        vle32.v v1,0(a1),v0.t
        vse32.v v1,0(a0),v0.t
        sub     a4,a3,a5
        beq     a3,a5,.L6
        slli    a5,a5,2
        vsetvli a4,a4,e8,mf4,ta,ma
        add     a2,a2,a5
        vle32.v v0,0(a2)
        add     a1,a1,a5
        vsetvli a6,zero,e32,m1,ta,ma
        add     a0,a0,a5
        vmsne.vi        v0,v0,0
        vsetvli zero,a4,e32,m1,ta,ma
        vle32.v v1,0(a1),v0.t
        vse32.v v1,0(a0),v0.t
.L6:
        ret

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::parse_insn): Ehance
AVL propagation.
* config/riscv/riscv-vsetvl.h: New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: Add dump checks.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: New test.

RISC-V: fix expand function of vlmul_ext RVV intrinsic

Consider this following case:
void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
  vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
}

Compilation fails with:
test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4':
test.c:5:1: error: unrecognizable insn:
    5 | }
      | ^
(insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32])
        (mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1
     (nil))
during RTL pass: vregs
test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791
0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
        ../.././riscv-gcc/gcc/rtl-error.cc:108
0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        ../.././riscv-gcc/gcc/rtl-error.cc:116
0xed58a7 extract_insn(rtx_insn*)
        ../.././riscv-gcc/gcc/recog.cc:2791
0xb7f789 instantiate_virtual_regs_in_insn
        ../.././riscv-gcc/gcc/function.cc:1611
0xb7f789 instantiate_virtual_regs
        ../.././riscv-gcc/gcc/function.cc:1984

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: change emit_insn to
emit_move_insn

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.

RISC-V: Enable len_mask{load, store} and remove len_{load, store}

This patch enable len_mask_{load,store} to support flow-control in RVV auto-vectorization.

Consider this following case:
void
f (int32_t *__restrict a,
   int32_t *__restrict b,
   int32_t *__restrict cond,
   int n)
{
  for (int i = 0; i < n; i++)
    if (cond[i])
      a[i] = b[i];
}

Before this patch:
<source>:9:21: missed: couldn't vectorize loop
<source>:9:21: missed: not vectorized: control flow in loop.

After this patch:
f:
        ble     a3,zero,.L5
.L3:
        vsetvli a5,a3,e32,m1,ta,ma
        vle32.v v0,0(a2)
        vsetvli a6,zero,e32,m1,ta,ma
        slli    a4,a5,2
        vmsne.vi        v0,v0,0
        sub     a3,a3,a5
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v1,0(a1),v0.t
        vse32.v v1,0(a0),v0.t
        add     a2,a2,a4
        add     a1,a1,a4
        add     a0,a0,a4
        bne     a3,zero,.L3
.L5:
        ret

gcc/ChangeLog:

* config/riscv/autovec.md (len_load_<mode>): Remove.
(len_maskload<mode><vm>): Remove.
(len_store_<mode>): New pattern.
(len_maskstore<mode><vm>): New pattern.
* config/riscv/predicates.md (autovec_length_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_load_store): New function.
* config/riscv/riscv-v.cc (emit_vlmax_masked_insn): Ditto.
(emit_nonvlmax_masked_insn): Ditto.
(expand_load_store): Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_contiguous_store_insn): Add avl_type operand
into pred_store.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c: New test.

internal-fn: Fix bug of BIAS argument index

When trying to enable LEN_MASK_{LOAD,STORE} in RISC-V port,
I found I made a mistake in case of argument index of BIAS.

This patch is an obvious fix.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Fix bug of BIAS
argument index.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add Lehua Ding to write after approval

configure, Darwin: Ensure overrides to host-pie are passed to gcc configure.

The latest versions of Darwin on the Aarch64 platform mandate PIE executables.
On x86_64 it remains optional, but produces tool warnings after Darwin20, so
we default to PIE executables there too.

All (non-PowerPC) 64b Darwin platforms mandate PIC code and therefore force
host_shared on (we issue a diagnostic if the user tries to configure them
non-shared).

However, this also means we cannot test the host_shared setting independently
of the host_pie setting so that the logic for setting PICFLAG must be amended
for Darwin.

For Darwin versions required to have PIE executables, in the event that the
user tries to configure these as --disable-host-pie, we issue a warning and
override the setting. These versions must also switch host_pie on even if it
is not given in the configure line. To cater for this we pass the current
value of host_pie, as determined by top-level configure, to the GCC configure.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
ChangeLog:

* Makefile.def: Pass the enable-host-pie value to GCC configure.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Adjust the logic for shared and PIE host flags to
ensure that PIE is passed for hosts that require it.

Revert "RISC-V:Add float16 tuple type abi"

This reverts commit f9ab5d62c94547499de52c800ab914cc8e802212 due to the
bootstrap failure on machine mode out of range memory access.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/vector.md: Revert.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: Revert.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: Ditto.
* gcc.target/riscv/rvv/base/abi-18.c: Ditto.

Revert "RISC-V:Add float16 tuple type support"

This reverts commit 8a96f240d71d367a2955ab9e0f0fef3a0b0e2a74 due to
bootstrap failure on mode out of range access, will commit this patch
after the issue addressed.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Revert changes.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): Ditto.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): Ditto.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Diito.
(vfloat16m2x4_t): Diito.
(vfloat16m4x2_t): Diito.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Ditto.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-28.c: Removed.
* gcc.target/riscv/rvv/base/tuple-29.c: Removed.
* gcc.target/riscv/rvv/base/tuple-30.c: Removed.
* gcc.target/riscv/rvv/base/tuple-31.c: Removed.
* gcc.target/riscv/rvv/base/tuple-32.c: Removed.

Signed-off-by: Pan Li <pan2.li@intel.com>

GIMPLE_FOLD: Apply LEN_MASK_{LOAD,STORE} into GIMPLE_FOLD

Hi, since we are going to have LEN_MASK_{LOAD,STORE} into loopVectorizer.

Currenly,
1. we can fold MASK_{LOAD,STORE} into MEM when mask is all ones.
2. we can fold LEN_{LOAD,STORE} into MEM when (len - bias) is VF.

Now, I think it makes sense that we can support

fold LEN_MASK_{LOAD,STORE} into MEM when both mask = all ones and (len - bias) is VF.

gcc/ChangeLog:

* gimple-fold.cc (arith_overflowed_p): Apply LEN_MASK_{LOAD,STORE}.
(gimple_fold_partial_load_store_mem_ref): Ditto.
(gimple_fold_partial_store): Ditto.
(gimple_fold_call): Ditto.

Refine maskloadmn pattern with UNSPEC_MASKLOAD.

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.

gcc/ChangeLog:

PR target/110309
* config/i386/sse.md (maskload<mode><avx512fmaskmodelower>):
Refine pattern with UNSPEC_MASKLOAD.
(maskload<mode><avx512fmaskmodelower>): Ditto.
(*<avx512>_load<mode>_mask): Extend mode iterator to
VI12HFBF_AVX512VL.
(*<avx512>_load<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110309.c: New test.

SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'

gcc/ChangeLog:

* tree-ssa-alias.cc (call_may_clobber_ref_p_1): Add LEN_MASK_STORE.

SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Apply
LEN_MASK_{LOAD,STORE}

RISC-V:Add float16 tuple type abi

gcc/ChangeLog:

* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.

Daily bump.

i386: Add alternate representation for {and,or,xor}b %ah,%dh.

A patch that I'm working on to improve RTL simplifications in the
middle-end results in the regression of pr78904-1b.c, due to changes in
the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic.
See also PR target/78904.

This patch avoids/prevents those failures by adding support for the
alternate representation, duplicating the existing *<code>qi_ext<mode>_2
as *<code>qi_ext<mode>_3 (the new version also replacing any_or with
any_logic to provide *andqi_ext<mode>_3 in the same pattern).  Removing
the original pattern isn't trivial, as it's generated by define_split,
but this can be investigated after the other pieces are approved.

The current representation of this instruction is:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
        (const_int 8 [0x8])
        (const_int 8 [0x8]))
    (subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94)
                    (const_int 8 [0x8])
                    (const_int 8 [0x8])) 0)
            (subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
                    (const_int 8 [0x8])
                    (const_int 8 [0x8])) 0)) 0))

after my proposed middle-end improvement, we attempt to recognize:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
        (const_int 8 [0x8])
        (const_int 8 [0x8]))
    (zero_extract:DI (xor:DI (reg:DI 94)
            (reg/v:DI 87 [ aD.2763 ]))
        (const_int 8 [0x8])
        (const_int 8 [0x8])))

2023-06-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (*<code>qi_ext<mode>_3): New define_insn.

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): Truncate constant string
argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy.

RISC-V: Refactor the integer ternary autovec pattern

Long time ago, I encounter ICE when trying to set clobber register as Pmode
and I forgot the reason.

So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which
makes patterns look unreasonable.

According to Jeff's comments, I tried it again, it works now when we try to
set clobber register as Pmode and the patterns look more reasonable now.

The tests are all passed, Ok for trunk.

gcc/ChangeLog:

* config/riscv/autovec.md (*fma<mode>): set clobber to Pmode in expand stage.
(*fma<VI:mode><P:mode>): Ditto.
(*fnma<mode>): Ditto.
(*fnma<VI:mode><P:mode>): Ditto.

RISC-V: Support RVV floating-point auto-vectorization

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

* config/riscv/autovec.md (fma<mode>4): New pattern.
(*fma<mode>): Ditto.
(fnma<mode>4): Ditto.
(*fnma<mode>): Ditto.
(fms<mode>4): Ditto.
(*fms<mode>): Ditto.
(fnms<mode>4): Ditto.
(*fnms<mode>): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn):
New function.
* config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
* config/riscv/vector.md: Fix attribute bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Adjust tests.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.

LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

Hi, Jeff. I fix format as you suggested.
Ok for trunk ?

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn):
Apply LEN_MASK_{LOAD,STORE}.

IVOPTS: Add LEN_MASK_{LOAD, STORE} into 'get_alias_ptr_type_for_ptr_address'

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address):
Add LEN_MASK_{LOAD,STORE}.

text-art: remove explicit #include of C++ standard library headers

gcc/analyzer/ChangeLog:
* access-diagram.cc: Add #define INCLUDE_VECTOR.
* bounds-checking.cc: Likewise.

gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add #define INCLUDE_VECTOR.
* diagnostic.cc: Likewise.
* text-art/box-drawing.cc: Likewise.
* text-art/canvas.cc: Likewise.
* text-art/ruler.cc: Likewise.
* text-art/selftests.cc: Likewise.
* text-art/selftests.h (text_art::canvas): New forward decl.
* text-art/style.cc: Add #define INCLUDE_VECTOR.
* text-art/styled-string.cc: Likewise.
* text-art/table.cc: Likewise.
* text-art/table.h: Remove #include <vector>.
* text-art/theme.cc: Add #define INCLUDE_VECTOR.
* text-art/types.h: Check that INCLUDE_VECTOR is defined.
Remove #include of <vector> and <string>.
* text-art/widget.cc: Add #define INCLUDE_VECTOR.
* text-art/widget.h: Remove #include <vector>.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: Add
#define INCLUDE_VECTOR.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

Address comments from Richard and Bernhard from V5 patch.
V6 fixed all issues according their comments.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Adapt for LEN_MASK_STORE.
(internal_load_fn_p): Add LEN_MASK_LOAD.
(internal_store_fn_p): Add LEN_MASK_STORE.
(internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
(internal_fn_stored_value_index): Add LEN_MASK_STORE.
(internal_len_load_store_bias): Add LEN_MASK_{LOAD,STORE}.
* optabs-tree.cc (can_vec_mask_load_store_p): Adapt for LEN_MASK_{LOAD,STORE}.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
(vectorizable_load): Ditto.

Daily bump.

compiler, libgo: support bootstrapping gc compiler

In the Go 1.21 release the package internal/profile imports
internal/lazyregexp.  That works when bootstrapping with Go 1.17,
because that compiler has internal/lazyregep and permits importing it.
We also have internal/lazyregexp in libgo, but since it is not installed
it is not available for importing.  This CL adds internal/lazyregexp
to the list of internal packages that are installed for bootstrapping.

The Go 1.21, and earlier, releases have a couple of functions in
the internal/abi package that are always fully intrinsified.
The gofrontend recognizes and intrinsifies those functions as well.
However, the gofrontend was also building function descriptors
for references to the functions without calling them, which
failed because there was nothing to refer to.  That is OK for the
gc compiler, which guarantees that the functions are only called,
not referenced.  This CL arranges to not generate function descriptors
for these functions.

For golang/go#60913

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/504798

c++: provide #include hint for missing includes [PR110164]

PR c++/110164 notes that in cases where we have a forward decl
of a std library type such as:

std::array<int, 10> x;

we emit this diagnostic:

error: aggregate ‘std::array<int, 10> x’ has incomplete type and cannot be defined

This patch adds this hint to the diagnostic:

note: ‘std::array’ is defined in header ‘<array>’; this is probably fixable by adding ‘#include <array>’

gcc/cp/ChangeLog:
PR c++/110164
* cp-name-hint.h (maybe_suggest_missing_header): New decl.
* decl.cc: Define INCLUDE_MEMORY. Add include of
"cp/cp-name-hint.h".
(start_decl_1): Call maybe_suggest_missing_header.
* name-lookup.cc (maybe_suggest_missing_header): Remove "static".

gcc/testsuite/ChangeLog:
PR c++/110164
* g++.dg/diagnostic/missing-header-pr110164.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: Add support for -std={c,gnu}++2{c,6}

It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

Since C++23 is essentially finished and its __cplusplus value has
settled to 202302L, I've updated cpp_init_builtins and marked
-std=c++2b Undocumented and made -std=c++23 no longer Undocumented.

As for __cplusplus, I've chosen 202400L:

  $ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
  #define __cplusplus 202400L

I've verified the patch with a simple test, exercising the new
directives.  Don't forget to update your GXX_TESTSUITE_STDS!

This patch does not add -Wc++26-extensions.

gcc/c-family/ChangeLog:

* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.

gcc/ChangeLog:

* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26.  Document that for C++23, its value is 202302L.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.

libcpp/ChangeLog:

* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202400L for C++26.
Set __cplusplus to 202302L for C++23.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp23/cplusplus.C: Adjust expected value.
* g++.dg/cpp26/cplusplus.C: New test.

libcpp: allow UCS_LIMIT codepoints in UTF-8 strings

Fixes r14-1954 (libcpp: reject codepoints above 0x10FFFF, 2023-06-06)

libcpp/

* charset.cc: Allow `UCS_LIMIT` in UTF-8 strings.

Reported-by: Damien Guibouret <damien.guibouret@partition-saving.com>
Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>

libstdc++: Use RAII in std::vector::_M_realloc_insert

Replace the try-block with RAII types for deallocating storage and
destroying elements.

libstdc++-v3/ChangeLog:

* include/bits/vector.tcc (_M_realloc_insert): Replace try-block
with RAII types.

Tiny phiprop compile time optimization

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Compute post dominators on
demand.
(pass_phiprop::execute): Do not compute it here; return
update_ssa_only_virtuals if something changed.
(pass_data_phiprop): Remove TODO_update_ssa from todos.

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): Pass actual argument
to scalar CHARACTER(1),VALUE dummy argument by value.

gcc/testsuite/ChangeLog:

PR fortran/110360
* gfortran.dg/value_9.f90: New test.

Fix power10 fusion bug with prefixed loads, PR target/105325

This changes fixes PR target/105325.  PR target/105325 is a bug where an
invalid lwa instruction is generated due to power10 fusion of a load
instruction to a GPR and an compare immediate instruction with the immediate
being -1, 0, or 1.

In some cases, when the load instruction is done, the GCC compiler would
generate a load instruction with an offset that was too large to fit into the
normal load instruction.

In particular, loads from the stack might originally have a small offset, so
that the load is not a prefixed load.  However, after the stack is set up, and
register allocation has been done, the offset now is large enough that we would
have to use a prefixed load instruction.

The support for prefixed loads did not consider that patterns with a fused load
and compare might have a prefixed address.  Without this support, the proper
prefixed load won't be generated.

In the original code, when the split2 pass is run after reload has finished the
ds_form_mem_operand predicate that was used for lwa and ld no longer returns
true.  When the pattern was created, ds_form_mem_operand recognized the insn as
being valid since the offset was small.  But after register allocation,
ds_form_mem_operand did not return true.  Because it didn't return true, the
insn could not be split.  Since the insn was not split and the prefix support
did not indicate a prefixed instruction was used, the wrong load is generated.

The solution involves:

    1) Don't use ds_form_mem_operand for ld and lwa, always use
non_update_memory_operand.

    2) Delete ds_form_mem_operand since it is no longer used.

    3) Use the "YZ" constraints for ld/lwa instead of "m".

    4) If we don't need to sign extend the lwa, convert it to lwz, and use
cmpwi instead of cmpdi.  Adjust the insn name to reflect the code
generate.

    5) Insure that the insn using lwa will be recognized as having a prefixed
operand (and hence the insn length will be 16 bytes instead of 8
bytes).

5a) Set the prefixed and maybe_prefix attributes to know that
    fused_load_cmpi are also load insns;

5b) In the case where we are just setting CC and not using the memory
    afterward, set the clobber to use a DI register, and put an
    explicit sign_extend operation in the split;

5c) Set the sign_extend attribute to "yes" for lwa.

5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
    ensure that lwa is treated as a ds-form instruction and not as
    a d-form instruction (i.e. lwz).

    6) Add a new test case for this case.

    7) Adjust the insn counts in fusion-p10-ldcmpi.c.  Because we are no
longer using ds_form_mem_operand, the ld and lwa instructions will fuse
x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).

2023-06-23   Michael Meissner  <meissner@linux.ibm.com>

gcc/

PR target/105325
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
allowed prefixed lwa to be generated.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
* config/rs6000/rs6000.md (prefixed attribute): Add support for load
plus compare immediate fused insns.
(maybe_prefixed): Likewise.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts.

Co-Authored-By: Aaron Sawdey <acsawdey@linux.ibm.com>

testsuite,objective-c++: Fix imported NSObjCRuntime.h.

We have imported some headers from the GNUStep project to allow us
to maintain the testsuite independent to changing versions of system
headers.

One of these headers has a macro that (now we have support for
__has_feature) expands to a declaration that triggers a warning.

These headers are considered part of the implementation so that, in
this case, we can suppress the warning with the system_header pragma.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:

* objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h: Make
this header use pragma system_header.

Improved SUBREG simplifications in simplify-rtx.cc's simplify_subreg.

An x86 backend improvement that I'm working results in combine attempting
to recognize:

(set (reg:DI 87 [ xD.2846 ])
     (ior:DI (subreg:DI (ashift:TI (zero_extend:TI (reg:DI 92))
                                   (const_int 64 [0x40])) 0)
             (reg:DI 91)))

where the lowpart SUBREG has difficulty seeing through the (hi<<64)
that the lowpart must be zero.  Rather than workaround this in the
backend, the better fix is to teach simplify-rtx that
lowpart((hi<<64)|lo) -> lo and highpart((hi<<64)|lo) -> hi, so that
all backends benefit.  Reducing the number of places where the
middle-end generates a SUBREG of something other than REG is a
good thing.

On x86_64-pc-linux-gnu, the testcase pr78904-1b.c FAILs with this patch,
due to changes in expected/canonical RTL, for which a backend patch to
i386.md has already been provisionally approved.

2023-06-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* simplify-rtx.cc (simplify_subreg):  Optimize lowpart SUBREGs
of ASHIFT to const0_rtx with sufficiently large shift count.
Optimize highpart SUBREGs of ASHIFT as the shift operand when
the shift count is the correct offset.  Optimize SUBREGs of
multi-word logic operations if the SUBREGs of both operands
can be simplified.

Fix initializer_constant_valid_p_1 TYPE_PRECISION use

initializer_constant_valid_p_1 is letting through all conversions
of float vector types that have the same number of elements but
that's of course not valid. The following restricts the code
to scalar floating point types as was probably intended (only
scalar integer types are handled as well).

* varasm.cc (initializer_constant_valid_p_1): Only
allow conversions between scalar floating point types.

Deal with vector typed operands in conversions

The following avoids using TYPE_PRECISION on VECTOR_TYPE when
looking for bit-precision changes in vectorizable_assignment.
We didn't anticipate a stmt like

_21 = VIEW_CONVERT_EXPR<unsigned int>(vect__1.7_28);

and the following makes sure to handle that.

* tree-vect-stmts.cc (vectorizable_assignment):
Properly handle non-integral operands when analyzing
conversions.

[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type if operand and mask's types don't match.

gcc/testsuite/ChangeLog:
PR tree-optimization/110280
* gcc.target/aarch64/sve/pr110280.c: New test.

Fix tree_simple_nonnegative_warnv_p for VECTOR_TYPEs

tree_simple_nonnegative_warnv_p ends up being called on VECTOR_TYPEs
which I think even gets the wrong answer here for tcc_comparison
since vector bools are signed. The following properly guards
that with !VECTOR_TYPE_P.

* fold-const.cc (tree_simple_nonnegative_warnv_p): Guard
the truth_value_p case with !VECTOR_TYPE_P.

Properly guard vect_look_through_possible_promotion

The function ends up getting called on VECTOR_TYPEs which it
really isn't prepared for and with the TYPE_PRECISION checking
changes will ICE. The following exits early when the type
to work on isn't scalar integral.

* tree-vect-patterns.cc (vect_look_through_possible_promotion):
Exit early when the type isn't scalar integral.

Use element_precision for match.pd arith conversion optimization

The simplification (outertype)((innertype0)a+(innertype1)b) to
((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check
whether it can elide a conversion but in some paths there can
be VECTOR_TYPEs where this instead compares the number of lanes.
The following fixes the missed optimizations and uses
element_precision in those places.

* match.pd ((outertype)((innertype0)a+(innertype1)b)
-> ((newtype)a+(newtype)b)): Use element_precision
where appropriate.

Bogus and missed folding on vector compares

fold_binary tries to transform (double)float1 CMP (double)float2
into float1 CMP float2 but ends up using TYPE_PRECISION on the
argument types. For vector types that compares the number of
lanes which should be always equal (so it's harmless as to
not generating wrong code). The following instead properly
uses element_precision.

The same happens in the corresponding match.pd pattern.

* fold-const.cc (fold_binary_loc): Use element_precision
when trying (double)float1 CMP (double)float2 to
float1 CMP float2 simplification.
* match.pd: Likewise.

Optimize vector codegen for invariant loads, fix SLP support

The following avoids creating duplicate stmts for invariant loads
which was necessary when the vector stmts were in a linked list.
It also fixes SLP support which didn't correctly create the
appropriate number of copies.

* tree-vect-stmts.cc (vectorizable_load): Avoid useless
copies of VMAT_INVARIANT vectorized stmts, fix SLP support.

Improve vector_vector_composition_type

We sometimes get to ask to decompose, say V2DFmode into two halves.
Currently this results in composing it from two DImode pieces
instead of the obvious two DFmode pieces. The following adjusts
vector_vector_composition_type for this trivial case and avoids
a VIEW_CONVERT_EXPR in the initial code generation.

* tree-vect-stmts.cc (vector_vector_composition_type):
Handle composition of a vector from a number of elements that
happens to match its number of lanes.

Daily bump.

rust: Update usage of TARGET_AIX to TARGET_AIX_OS

This was noticed when fixing the gccgo usage of the macro, the
rust usage is very similar.

TARGET_AIX is defined as a non-zero value on linux/powerpc64le
which may cause unexpected behavior. TARGET_AIX_OS should be
used to toggle AIX specific behavior.

2023-06-22 Paul E. Murphy <murphyp@linux.ibm.com>

gcc/rust/
* rust-object-export.cc [TARGET_AIX]: Rename and update usage to
TARGET_AIX_OS.

go: Update usage of TARGET_AIX to TARGET_AIX_OS

TARGET_AIX is defined to a non-zero value on linux and maybe other
powerpc64le targets. This leads to unexpected behavior such as
dropping the .go_export section when linking a shared library
on linux/powerpc64le.

Instead, use TARGET_AIX_OS to toggle AIX specific behavior.

Fixes golang/go#60798.

2023-06-22 Paul E. Murphy <murphyp@linux.ibm.com>

gcc/go/
* go-backend.cc [TARGET_AIX]: Rename and update usage to TARGET_AIX_OS.
* go-lang.cc: Likewise.

configure: Implement --enable-host-bind-now

As promised in the --enable-host-pie patch, this patch adds another
configure option, --enable-host-bind-now, which adds -z now when linking
the compiler executables in order to extend hardening.  BIND_NOW with RELRO
allows the GOT to be marked RO; this prevents GOT modification attacks.

This option does not affect linking of target libraries; you can use
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.

With this patch:
$ readelf -Wd cc1{,plus,obj,gm2} f951 lto1 cpp  rust1 gnat1 | grep FLAGS
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
0x000000000000001e (FLAGS)              BIND_NOW
0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE

c++tools/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.
* configure: Regenerate.

gcc/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.  Add
-Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
* configure: Regenerate.
* doc/install.texi: Document --enable-host-bind-now.

lto-plugin/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.  Link with
-z,now.
* configure: Regenerate.

Change fma_reassoc_width tuning for ampere1

This patch enables reassociation of floating-point additions on ampere1.
This brings about 1% overall benefit on spec2017 fprate cases. (There
are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1.

libgomp.texi: Improve OpenMP ICV description

Use @var{} instead of @emph{} - for semantic texinfo formatting; the result
is similar: slanted instead of italic in PDF, still italic in HTML, albeit
in info is is now uppercase instead of '_' as pre/suffix.

The patch also documents the newer _ALL/_DEV/_DEV_<no> env var suffixes
and as it refers to the ICV vars and their scope, those were added to the
OMP_ env vars for reference. For OMP_NESTING, a note that those were
deprecated was added plus a bunch of cross references. For OMP_ALLOCATOR,
add note about the lack of per-device env vars support.

A new section, consisting mostly of cross references was added to document
the implementation-defined ICV initialization, especially as OpenMP demands
that implementations document what they do for 'implementation defined'.

For nvptx, the implementation-defined used stack size was documented

libgomp/
* libgomp.texi: Use @var for ICV vars.
(OpenMP Environment Variables): Mention _ALL/_DEV/_DEV_<no> variants,
document which ICV is set and which scope the ICV has; extend/cleanup
some @ref.
(Implementation-defined ICV Initialization): New.
(nvptx): Document the implementation-defined used per-warp stack size.

tree-optimization/110332 - fix ICE with phiprop

The following fixes an ICE that occurs when we visit an edge
inserted load from the code validating correctness for inserting
an aggregate copy there. We can simply skip those loads here.

PR tree-optimization/110332
* tree-ssa-phiprop.cc (propagate_with_phi): Always
check aliasing with edge inserted loads.

* g++.dg/torture/pr110332.C: New testcase.
* gcc.dg/torture/pr110332-1.c: Likewise.
* gcc.dg/torture/pr110332-2.c: Likewise.

i386: Convert ptestz of pandn into ptestc.

This patch is the next installment in a set of backend patches around
improvements to ptest/vptest.  A previous patch optimized the sequence
t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
property that ZF is set to (X&Y) == 0.  This patch performs a similar
transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost)
equivalent ptestc(y,x), using the property that the CF flags is set to
(~X&Y) == 0.  The tricky bit is that this sets the CF flag instead of
the ZF flag, so we can only perform this transformation when we can
also convert the flags consumer, as well as the producer.

For the test case:

int foo (__m128i x, __m128i y)
{
  __m128i a = x & ~y;
  return __builtin_ia32_ptestz128 (a, a);
}

With -O2 -msse4.1 we previously generated:

foo: pandn   %xmm0, %xmm1
        xorl    %eax, %eax
        ptest   %xmm1, %xmm1
        sete    %al
        ret

with this patch we now generate:

foo: xorl    %eax, %eax
        ptest   %xmm0, %xmm1
        setc    %al
        ret

At the same time, this patch also provides alternative fixes for
PR target/109973 and PR target/110118, by recognizing that ptestc(x,x)
always sets the carry flag (X&~X is always zero).  This is achieved
both by recognizing the special case in ix86_expand_sse_ptest and with
a splitter to convert an eligible ptest into an stc.

2023-06-22  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
expansion of ptestc with equal operands as producing const1_rtx.
* config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
estimates of UNSPEC_PTEST, where the ptest performs the PAND
or PAND of its operands.
* config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
of reg_equal_p operands into an x86_stc instruction.
(define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c.
(define_split): Similar to above for strict_low_part destinations.
(define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx-vptest-4.c: New test case.
* gcc.target/i386/avx-vptest-5.c: Likewise.
* gcc.target/i386/avx-vptest-6.c: Likewise.
* gcc.target/i386/pr109973-1.c: Update test case.
* gcc.target/i386/pr109973-2.c: Likewise.
* gcc.target/i386/sse4_1-ptest-4.c: New test case.
* gcc.target/i386/sse4_1-ptest-5.c: Likewise.
* gcc.target/i386/sse4_1-ptest-6.c: Likewise.

analyzer: add text-art visualizations of out-of-bounds accesses [PR106626]

This patch extends -Wanalyzer-out-of-bounds so that, where possible, it
will emit a text art diagram visualizing the spatial relationship between
(a) the memory region that the analyzer predicts would be accessed, versus
(b) the range of memory that is valid to access - whether they overlap,
are touching, are close or far apart; which one is before or after in
memory, the relative sizes involved, the direction of the access (read vs
write), and, in some cases, the values of data involved.  This diagram
can be suppressed using -fdiagnostics-text-art-charset=none.

For example, given:

int32_t arr[10];

int32_t int_arr_read_element_before_start_far(void)
{
  return arr[-100];
}

it emits:

demo-1.c: In function ‘int_arr_read_element_before_start_far’:
demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds]
    7 |   return arr[-100];
      |          ~~~^~~~~~
  ‘int_arr_read_element_before_start_far’: event 1
    |
    |    7 |   return arr[-100];
    |      |          ~~~^~~~~~
    |      |             |
    |      |             (1) out-of-bounds read from byte -400 till byte -397 but ‘arr’ starts at byte 0
    |
demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’

  ┌───────────────────────────┐
  │read of ‘int32_t’ (4 bytes)│
  └───────────────────────────┘
                ^
                │
                │
  ┌───────────────────────────┐              ┌────────┬────────┬─────────┐
  │                           │              │  [0]   │  ...   │   [9]   │
  │    before valid range     │              ├────────┴────────┴─────────┤
  │                           │              │‘arr’ (type: ‘int32_t[10]’)│
  └───────────────────────────┘              └───────────────────────────┘
  ├─────────────┬─────────────┤├─────┬──────┤├─────────────┬─────────────┤
                │                    │                     │
   ╭────────────┴───────────╮   ╭────┴────╮        ╭───────┴──────╮
   │⚠️  under-read of 4 bytes│   │396 bytes│        │size: 40 bytes│
   ╰────────────────────────╯   ╰─────────╯        ╰──────────────╯

and given:

  #include <string.h>

  void
  test_non_ascii ()
  {
    char buf[5];
    strcpy (buf, "文字化け");
  }

it emits:

demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_non_ascii’: events 1-2
    |
    |    6 |   char buf[5];
    |      |        ^~~
    |      |        |
    |      |        (1) capacity: 5 bytes
    |    7 |   strcpy (buf, "文字化け");
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5
    |
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’

  ┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐
  │ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
  ├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤
  │0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
  ├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤
  │     U+6587      │    U+5b57     │    U+5316    │    U+3051    │U+0000│
  ├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤
  │       文        │      字       │      化      │      け      │ NUL  │
  ├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤
  │                  string literal (type: ‘char[13]’)                   │
  └──────────────────────────────────────────────────────────────────────┘
     │     │     │    │    │     │    │    │    │    │    │    │     │
     │     │     │    │    │     │    │    │    │    │    │    │     │
     v     v     v    v    v     v    v    v    v    v    v    v     v
  ┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐
  │ [0] │      ...       │[4] ││                                         │
  ├─────┴────────────────┴────┤│            after valid range            │
  │  ‘buf’ (type: ‘char[5]’)  ││                                         │
  └───────────────────────────┘└─────────────────────────────────────────┘
  ├─────────────┬─────────────┤├────────────────────┬────────────────────┤
                │                                   │
       ╭────────┴────────╮              ╭───────────┴──────────╮
       │capacity: 5 bytes│              │⚠️  overflow of 8 bytes│
       ╰─────────────────╯              ╰──────────────────────╯

showing that the overflow occurs partway through the UTF-8 encoding of
the U+5b57 code point.

There are lots more examples in the test suite.

It doesn't show up in this email, but the above diagrams are colorized
to constrast the valid and invalid access ranges.

gcc/ChangeLog:
PR analyzer/106626
* Makefile.in (ANALYZER_OBJS): Add analyzer/access-diagram.o.
* doc/invoke.texi (Wanalyzer-out-of-bounds): Add description of
text art.
(fanalyzer-debug-text-art): New.

gcc/analyzer/ChangeLog:
PR analyzer/106626
* access-diagram.cc: New file.
* access-diagram.h: New file.
* analyzer.h (class region_offset): Add default ctor.
(region_offset::make_byte_offset): New decl.
(region_offset::concrete_p): New.
(region_offset::get_concrete_byte_offset): New.
(region_offset::calc_symbolic_bit_offset): New decl.
(region_offset::calc_symbolic_byte_offset): New decl.
(region_offset::dump_to_pp): New decl.
(region_offset::dump): New decl.
(operator<, operator<=, operator>, operator>=): New decls for
region_offset.
* analyzer.opt
(-param=analyzer-text-art-string-ellipsis-threshold=): New.
(-param=analyzer-text-art-string-ellipsis-head-len=): New.
(-param=analyzer-text-art-string-ellipsis-tail-len=): New.
(-param=analyzer-text-art-ideal-canvas-width=): New.
(fanalyzer-debug-text-art): New.
* bounds-checking.cc: Include "intl.h", "diagnostic-diagram.h",
and "analyzer/access-diagram.h".
(class out_of_bounds::oob_region_creation_event_capacity): New.
(out_of_bounds::out_of_bounds): Add "model" and "sval_hint"
params.
(out_of_bounds::mark_interesting_stuff): Use the base region.
(out_of_bounds::add_region_creation_events): Use
oob_region_creation_event_capacity.
(out_of_bounds::get_dir): New pure vfunc.
(out_of_bounds::maybe_show_notes): New.
(out_of_bounds::maybe_show_diagram): New.
(out_of_bounds::make_access_diagram): New.
(out_of_bounds::m_model): New field.
(out_of_bounds::m_sval_hint): New field.
(out_of_bounds::m_region_creation_event_id): New field.
(concrete_out_of_bounds::concrete_out_of_bounds): Update for new
fields.
(concrete_past_the_end::concrete_past_the_end): Likewise.
(concrete_past_the_end::add_region_creation_events): Use
oob_region_creation_event_capacity.
(concrete_buffer_overflow::concrete_buffer_overflow): Update for
new fields.
(concrete_buffer_overflow::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_overflow::get_dir): New.
(concrete_buffer_over_read::concrete_buffer_over_read): Update for
new fields.
(concrete_buffer_over_read::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_overflow::get_dir): New.
(concrete_buffer_underwrite::concrete_buffer_underwrite): Update
for new fields.
(concrete_buffer_underwrite::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_underwrite::get_dir): New.
(concrete_buffer_under_read::concrete_buffer_under_read): Update
for new fields.
(concrete_buffer_under_read::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_under_read::get_dir): New.
(symbolic_past_the_end::symbolic_past_the_end): Update for new
fields.
(symbolic_buffer_overflow::symbolic_buffer_overflow): Likewise.
(symbolic_buffer_overflow::emit): Call maybe_show_notes.
(symbolic_buffer_overflow::get_dir): New.
(symbolic_buffer_over_read::symbolic_buffer_over_read): Update for
new fields.
(symbolic_buffer_over_read::emit): Call maybe_show_notes.
(symbolic_buffer_over_read::get_dir): New.
(region_model::check_symbolic_bounds): Add "sval_hint" param.  Pass
it and sized_offset_reg to diagnostics.
(region_model::check_region_bounds): Add "sval_hint" param, passing
it to diagnostics.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Pass logger to
pending_diagnostic::emit.
* engine.cc: Add logger param to pending_diagnostic::emit
implementations.
* infinite-recursion.cc: Likewise.
* kf-analyzer.cc: Likewise.
* kf.cc: Likewise.  Add nullptr for new param of
check_region_for_write.
* pending-diagnostic.h: Likewise in decl.
* region-model-manager.cc
(region_model_manager::get_or_create_int_cst): Convert param from
poly_int64 to const poly_wide_int_ref &.
(region_model_manager::maybe_fold_binop): Support type being NULL
when checking for floating-point types.
Check for (X + Y) - X => Y.  Be less strict about types when folding
associative ops.  Check for (X + Y) * CST => (X * CST) + (Y * CST).
* region-model-manager.h
(region_model_manager::get_or_create_int_cst): Convert param from
poly_int64 to const poly_wide_int_ref &.
* region-model.cc: Add logger param to pending_diagnostic::emit
implementations.
(region_model::check_external_function_for_access_attr): Update
for new param of check_region_for_write.
(region_model::deref_rvalue): Use nullptr rather than NULL.
(region_model::get_capacity): Handle RK_STRING.
(region_model::check_region_access): Add "sval_hint" param; pass it to
check_region_bounds.
(region_model::check_region_for_write): Add "sval_hint" param;
pass it to check_region_access.
(region_model::check_region_for_read): Add NULL for new param to
check_region_access.
(region_model::set_value): Pass rhs_sval to
check_region_for_write.
(region_model::get_representative_path_var_1): Handle SK_CONSTANT
in the check for infinite recursion.
* region-model.h (region_model::check_region_for_write): Add
"sval_hint" param.
(region_model::check_region_access): Likewise.
(region_model::check_symbolic_bounds): Likewise.
(region_model::check_region_bounds): Likewise.
* region.cc (region_offset::make_byte_offset): New.
(region_offset::calc_symbolic_bit_offset): New.
(region_offset::calc_symbolic_byte_offset): New.
(region_offset::dump_to_pp): New.
(region_offset::dump): New.
(struct linear_op): New.
(operator<, operator<=, operator>, operator>=): New, for
region_offset.
(region::get_next_offset): New.
(region::get_relative_symbolic_offset): Use ptrdiff_type_node.
(field_region::get_relative_symbolic_offset): Likewise.
(element_region::get_relative_symbolic_offset): Likewise.
(bit_range_region::get_relative_symbolic_offset): Likewise.
* region.h (region::get_next_offset): New decl.
* sm-fd.cc: Add logger param to pending_diagnostic::emit
implementations.
* sm-file.cc: Likewise.
* sm-malloc.cc: Likewise.
* sm-pattern-test.cc: Likewise.
* sm-sensitive.cc: Likewise.
* sm-signal.cc: Likewise.
* sm-taint.cc: Likewise.
* store.cc (bit_range::contains_p): Allow "out" to be null.
* store.h (byte_range::get_start_bit_offset): New.
(byte_range::get_next_bit_offset): New.
* varargs.cc: Add logger param to pending_diagnostic::emit
implementations.

gcc/testsuite/ChangeLog:
PR analyzer/106626
* gcc.dg/analyzer/data-model-1.c (test_16): Update for
out-of-bounds working.
* gcc.dg/analyzer/out-of-bounds-diagram-1-ascii.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-json.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-sarif.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-10.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-11.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-12.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-13.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-14.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-15.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-4.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-6.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-7.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-8.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-9.c: New test.
* gcc.dg/analyzer/pattern-test-2.c: Update expected results.
* gcc.dg/analyzer/pr101962.c: Update expected results.
* gcc.dg/plugin/analyzer_gil_plugin.c:  Add logger param to
pending_diagnostic::emit implementations.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: add support for "text art" diagrams

Existing text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance.  This makes it
hard to implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc).

This patch adds more flexible ways of creating text output:
- a canvas class, which can be "painted" to via random-access (rather
that sequentially)
- a table class for 2D grid layout, supporting items that span
multiple rows/columns
- a widget class for organizing diagrams hierarchically.

The patch also expands GCC's diagnostics subsystem so that diagnostics
can have "text art" diagrams - think ASCII art, but potentially
including some Unicode characters, such as box-drawing chars.

The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.

The patch adds a new "-fdiagnostics-text-art-charset=VAL" option, with
values:
- "none": don't emit diagrams (added to -fdiagnostics-plain-output)
- "ascii": use pure ASCII in diagrams
- "unicode": allow for conservative use of unicode drawing characters
(such as box-drawing characters).
- "emoji" (the default): as "unicode", but potentially allow for
conservative use of emoji in the output (such as U+26A0 WARNING SIGN).
I made it possible to disable emoji separately from unicode as I believe
there's a generation gap in acceptance of these characters (some older
programmers have a visceral reaction against them, whereas younger
programmers may have no problem with them).

Diagrams are emitted to stderr by default.  With SARIF output they are
captured as a location in "relatedLocations", with the diagram as a
code block in Markdown within a "markdown" property of a message.

This patch doesn't add any such diagram usage to GCC, saving that for
followups, apart from adding a plugin to the test suite to exercise the
functionality.

contrib/ChangeLog:
* unicode/gen-box-drawing-chars.py: New file.
* unicode/gen-combining-chars.py: New file.
* unicode/gen-printable-chars.py: New file.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add text-art/box-drawing.o,
text-art/canvas.o, text-art/ruler.o, text-art/selftests.o,
text-art/style.o, text-art/styled-string.o, text-art/table.o,
text-art/theme.o, and text-art/widget.o.
* color-macros.h (COLOR_FG_BRIGHT_BLACK): New.
(COLOR_FG_BRIGHT_RED): New.
(COLOR_FG_BRIGHT_GREEN): New.
(COLOR_FG_BRIGHT_YELLOW): New.
(COLOR_FG_BRIGHT_BLUE): New.
(COLOR_FG_BRIGHT_MAGENTA): New.
(COLOR_FG_BRIGHT_CYAN): New.
(COLOR_FG_BRIGHT_WHITE): New.
(COLOR_BG_BRIGHT_BLACK): New.
(COLOR_BG_BRIGHT_RED): New.
(COLOR_BG_BRIGHT_GREEN): New.
(COLOR_BG_BRIGHT_YELLOW): New.
(COLOR_BG_BRIGHT_BLUE): New.
(COLOR_BG_BRIGHT_MAGENTA): New.
(COLOR_BG_BRIGHT_CYAN): New.
(COLOR_BG_BRIGHT_WHITE): New.
* common.opt (fdiagnostics-text-art-charset=): New option.
(diagnostic-text-art.h): New SourceInclude.
(diagnostic_text_art_charset) New Enum and EnumValues.
* configure: Regenerate.
* configure.ac (gccdepdir): Add text-art to loop.
* diagnostic-diagram.h: New file.
* diagnostic-format-json.cc (json_emit_diagram): New.
(diagnostic_output_format_init_json): Wire it up to
context->m_diagrams.m_emission_cb.
* diagnostic-format-sarif.cc: Include "diagnostic-diagram.h" and
"text-art/canvas.h".
(sarif_result::on_nested_diagnostic): Move code to...
(sarif_result::add_related_location): ...this new function.
(sarif_result::on_diagram): New.
(sarif_builder::emit_diagram): New.
(sarif_builder::make_message_object_for_diagram): New.
(sarif_emit_diagram): New.
(diagnostic_output_format_init_sarif): Set
context->m_diagrams.m_emission_cb to sarif_emit_diagram.
* diagnostic-text-art.h: New file.
* diagnostic.cc: Include "diagnostic-text-art.h",
"diagnostic-diagram.h", and "text-art/theme.h".
(diagnostic_initialize): Initialize context->m_diagrams and
call diagnostics_text_art_charset_init.
(diagnostic_finish): Clean up context->m_diagrams.m_theme.
(diagnostic_emit_diagram): New.
(diagnostics_text_art_charset_init): New.
* diagnostic.h (text_art::theme): New forward decl.
(class diagnostic_diagram): Likewise.
(diagnostic_context::m_diagrams): New field.
(diagnostic_emit_diagram): New decl.
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
-fdiagnostics-text-art-charset=.
(-fdiagnostics-plain-output): Add
-fdiagnostics-text-art-charset=none.
* gcc.cc: Include "diagnostic-text-art.h".
(driver_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
* opts-common.cc (decode_cmdline_options_to_array): Add
"-fdiagnostics-text-art-charset=none" to expanded_args for
-fdiagnostics-plain-output.
* opts.cc: Include "diagnostic-text-art.h".
(common_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
* pretty-print.cc (pp_unicode_character): New.
* pretty-print.h (pp_unicode_character): New decl.
* selftest-run-tests.cc: Include "text-art/selftests.h".
(selftest::run_tests): Call text_art_tests.
* text-art/box-drawing-chars.inc: New file, generated by
contrib/unicode/gen-box-drawing-chars.py.
* text-art/box-drawing.cc: New file.
* text-art/box-drawing.h: New file.
* text-art/canvas.cc: New file.
* text-art/canvas.h: New file.
* text-art/ruler.cc: New file.
* text-art/ruler.h: New file.
* text-art/selftests.cc: New file.
* text-art/selftests.h: New file.
* text-art/style.cc: New file.
* text-art/styled-string.cc: New file.
* text-art/table.cc: New file.
* text-art/table.h: New file.
* text-art/theme.cc: New file.
* text-art/theme.h: New file.
* text-art/types.h: New file.
* text-art/widget.cc: New file.
* text-art/widget.h: New file.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-none.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c: New test.
* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: New test plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add them.

libcpp/ChangeLog:
* charset.cc (get_cppchar_property): New function template, based
on...
(cpp_wcwidth): ...this function.  Rework to use the above.
Include "combining-chars.inc".
(cpp_is_combining_char): New function
Include "printable-chars.inc".
(cpp_is_printable_char): New function
* combining-chars.inc: New file, generated by
contrib/unicode/gen-combining-chars.py.
* include/cpplib.h (cpp_is_combining_char): New function decl.
(cpp_is_printable_char): New function decl.
* printable-chars.inc: New file, generated by
contrib/unicode/gen-printable-chars.py.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: move handle-multiline-outputs to before check for blank lines

I have followup patches that require checking for multiline patterns
that have blank lines within them, so this moves the handling of
multiline patterns before the check for blank lines, allowing for such
multiline patterns.

Doing so uncovers some issues with existing multiline directives, which
the patch fixes.

gcc/testsuite/ChangeLog:
* c-c++-common/Wlogical-not-parentheses-2.c: Split up the
multiline directive.
* gcc.dg/analyzer/malloc-macro-inline-events.c: Remove redundant
dg-regexp directives.
* gcc.dg/missing-header-fixit-5.c: Split up the multiline
directives.
* lib/gcc-dg.exp (gcc-dg-prune): Move call to
handle-multiline-outputs from prune_gcc_output to here.
* lib/multiline.exp (dg-end-multiline-output): Move call to
maybe-handle-nn-line-numbers from prune_gcc_output to here.
* lib/prune.exp (prune_gcc_output): Move calls to
maybe-handle-nn-line-numbers and handle-multiline-outputs from
here to the above.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

compiler: determine types of Slice_{value,info} expressions

This fixes an accidental omission in the determine types pass.

Test case is https://go.dev/cl/505015.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/504797

Daily bump.

function: Change return type of predicate function from int to bool

Also change some internal variables to bool and some functions to void.

gcc/ChangeLog:

* function.h (emit_initial_value_sets):
Change return type from int to void.
(aggregate_value_p): Change return type from int to bool.
(prologue_contains): Ditto.
(epilogue_contains): Ditto.
(prologue_epilogue_contains): Ditto.
* function.cc (temp_slot): Make "in_use" variable bool.
(make_slot_available): Update for changed "in_use" variable.
(assign_stack_temp_for_type): Ditto.
(emit_initial_value_sets): Change return type from int to void
and update function body accordingly.
(instantiate_virtual_regs): Ditto.
(rest_of_handle_thread_prologue_and_epilogue): Ditto.
(safe_insn_predicate): Change return type from int to bool.
(aggregate_value_p): Change return type from int to bool
and update function body accordingly.
(prologue_contains): Change return type from int to bool.
(prologue_epilogue_contains): Ditto.

c-family: implement -ffp-contract=on

Implement -ffp-contract=on for C and C++ without changing default
behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).

gcc/c-family/ChangeLog:

* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.

gcc/ChangeLog:

* common.opt (fp_contract_mode) [on]: Remove fallback.
* config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
* doc/invoke.texi (-ffp-contract): Update.
* trans-mem.cc (diagnose_tm_1): Skip internal function calls.

Fortran: Fix some bugs in associate [PR87477]

2023-06-21 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/87477
PR fortran/88688
PR fortran/94380
PR fortran/107900
PR fortran/110224
* decl.cc (char_len_param_value): Fix memory leak.
(resolve_block_construct): Remove unnecessary static decls.
* expr.cc (gfc_is_ptr_fcn): New function.
(gfc_check_vardef_context): Use it to permit pointer function
result selectors to be used for associate names in variable
definition context.
* gfortran.h: Prototype for gfc_is_ptr_fcn.
* match.cc (build_associate_name): New function.
(gfc_match_select_type): Use the new function to replace inline
version and to build a new associate name for the case where
the supplied associate name is already used for that purpose.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (gfc_get_symbol_decl): Unlimited polymorphic
variables need deferred initialisation of the vptr.
(gfc_trans_deferred_vars): Do the vptr initialisation.
* trans-stmt.cc (trans_associate_var): Ensure that a pointer
associate name points to the target of the selector and not
the selector itself.

gcc/testsuite/
PR fortran/87477
PR fortran/107900
* gfortran.dg/pr107900.f90 : New test

PR fortran/110224
* gfortran.dg/pr110224.f90 : New test

PR fortran/88688
* gfortran.dg/pr88688.f90 : New test

PR fortran/94380
* gfortran.dg/pr94380.f90 : New test

PR fortran/95398
* gfortran.dg/pr95398.f90 : Set -std=f2008, bump the line
numbers in the error tests by two and change the text in two.

Fortran: Seg fault passing string to type cptr dummy [PR108961].

2023-06-21 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/108961
* trans-expr.cc (gfc_conv_procedure_call): The hidden string
length must not be passed to a formal arg of type(cptr).

gcc/testsuite/
PR fortran/108961
* gfortran.dg/pr108961.f90: New test.

vect: Add testcases for unsigned conversions [PR110018]

Also test convresions with unsigned types.

PR target/110018

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110018-1.c: Use explicit signed types.
* gcc.target/i386/pr110018-2.c: New test.

aarch64: Avoid same input and output Z register for gather loads

The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software Optimization
Guides for Arm cores recommend that as well.
This means that for code like:

svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
  return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}

we'll want to avoid generating the current:
food:
        ld1d    z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
        add     z0.d, z1.d, z0.d
        ret

However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.

This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'

This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.

The benchmark numbers on a Neoverse V1 on fprate look okay:
        diff
503.bwaves_r 0.00%
507.cactuBSSN_r 0.00%
508.namd_r 0.00%
510.parest_r 0.55%
511.povray_r 0.22%
519.lbm_r 0.00%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.56%
538.imagick_r 0.00%
544.nab_r 0.00%
549.fotonik3d_r 0.00%
554.roms_r 0.00%
fprate         0.10%

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Add alternatives to prefer to avoid same input and output Z register.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/gather_earlyclobber.c: New test.
* gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.

aarch64: Convert SVE gather patterns to compact syntax

This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.

The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.

No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Convert to compact alternatives syntax.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.

Revert "aarch64: Convert SVE gather patterns to compact syntax"

This reverts commit bb3c69058a5fb874ea3c5c26bfb331d33d0497c3.

Move can_vec_mask_load_store_p and get_len_load_store_mode from "optabs-query" into "optabs-tree"

Since we want both can_vec_mask_load_store_p and get_len_load_store_mode
can see "internal_fn", move these 2 functions into optabs-tree.

gcc/ChangeLog:

* optabs-query.cc (can_vec_mask_load_store_p): Move to optabs-tree.cc.
(get_len_load_store_mode): Ditto.
* optabs-query.h (can_vec_mask_load_store_p): Move to optabs-tree.h.
(get_len_load_store_mode): Ditto.
* optabs-tree.cc (can_vec_mask_load_store_p): New function.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-if-conv.cc: include optabs-tree instead of optabs-query

Less strip_offset in IVOPTs

This avoids one strip_offset use in add_iv_candidate_for_use where
we know it operates on a sizetype quantity.

* tree-ssa-loop-ivopts.cc (add_iv_candidate_for_use): Use
split_constant_offset for the POINTER_PLUS_EXPR case.

Less strip_offset in IVOPTs

This avoids a strip_offset use in record_group_use where we know
it operates on addresses.

* tree-ssa-loop-ivopts.cc (record_group_use): Use
split_constant_offset.

Hide IVOPTs strip_offset

PR110243 shows strip_offset has some correctness issues, the following
avoids using it from loop distribution which can use the more correct
split_constant_offset from data-ref analysis instead. The patch then
un-exports the function from IVOPTs.

* tree-loop-distribution.cc (classify_builtin_st): Use
split_constant_offset.
* tree-ssa-loop-ivopts.h (strip_offset): Remove.
* tree-ssa-loop-ivopts.cc (strip_offset): Make static.