git.ipfire.org Git - thirdparty/gcc.git/log

recog.cc: Correct comments referring to parameter match_len

* recog.cc (peep2_attempt, peep2_update_life): Correct
head-comment description of parameter match_len.

Regenerate gcc.pot

* gcc.pot: Regenerate.

c++: value dependence of by-ref lambda capture [PR108975]

We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses.  Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.

We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.

To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time.  It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P.  This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.

PR c++/108975

gcc/cp/ChangeLog:

* pt.cc (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.

riscv: relax splitter restrictions for creating pseudos

[partial addressing of PR/109279]

RISCV splitters have restrictions to not create pesudos due to a combine
limitatation. And despite this being a split-during-combine limitation,
all split passes take the hit due to way define*_split are used in gcc.

With the original combine issue being fixed 61bee6aed2 ("combine: Don't
record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]")
the RV splitters can now be relaxed.

This improves the codegen in general. e.g.

long long f(void) { return 0x0101010101010101ull; }

Before

li a0,0x01010000
addi a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
ret

With patch

li a5,0x01010000
addi a5,a5,0x0101
mv a0,a5
slli a5,a5,32
add a0,a5,a0
ret

This reduces the qemu icounts, even if slightly, across SPEC2017.

500.perlbench_r 0 1235310737733 1231742384460 0.29%
1 744489708820 743515759958
2 714072106766 712875768625 0.17%
502.gcc_r 0 197365353269 197178223030
1 235614445254 235465240341
2 226769189971 226604663947
3 188315686133 188123584015
4 289372107644 289187945424
503.bwaves_r 0 326291538768 326291539697
1 515809487294 515809488863
2 401647004144 401647005463
3 488750661035 488750662484
505.mcf_r 0 681926695281 681925418147
507.cactuBSSN_r 0 3832240965352 3832226068734
508.namd_r 0 1919838790866 1919832527292
510.parest_r 0 3515999635520 3515878553435
511.povray_r 0 3073889223775 3074758622749
519.lbm_r 0 1194077464296 1194077464041
520.omnetpp_r 0 1014144252460 1011530791131 0.26%
521.wrf_r 0 3966715533120 3966265425092
523.xalancbmk_r 0 1064914296949 1064506711802
525.x264_r 0 509290028335 509258131632
1 2001424246635 2001677767181
2 1914660798226 1914869407575
526.blender_r 0 1726083839515 1725974286174
527.cam4_r 0 2336526136415 2333656336419
531.deepsjeng_r 0 1689007489539 1686541299243 0.15%
538.imagick_r 0 3247960667520 3247942048723
541.leela_r 0 2072315300365 2070248271250
544.nab_r 0 1527909091282 1527906483039
548.exchange2_r 0 2086120304280 2086314757502
549.fotonik3d_r 0 2261694058444 2261670330720
554.roms_r 0 2640547903140 2640512733483
557.xz_r 0 388736881767 386880875636 0.48%
1 959356981818 959993132842
2 547643353034 546374038310 0.23%
997.specrand_fr 0 512881578 512599641
999.specrand_ir 0 512881578 512599641

This is testsuite clean, no regression w/ patch.

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
rv64imafdc/  lp64d/ medlow |    2 /     2 |    1 /     1 |    6 /     1 |
   rv64imac/   lp64/ medlow |    3 /     3 |    1 /     1 |   43 /     8 |
rv32imafdc/ ilp32d/ medlow |    1 /     1 |    3 /     2 |    6 /     1 |
   rv32imac/  ilp32/ medlow |    1 /     1 |    3 /     2 |   43 /     8 |

This came up as part of IRC chat on PR/109279 and was suggested by
Andrew Pinski.

gcc/ChangeLog:

* config/riscv/riscv.md: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
* config/riscv/riscv.cc: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
riscv_force_temporary() drop in_splitter arg.
* config/riscv/riscv-protos.h: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

Avoid creating useless debug temporaries

insert_debug_temp_for_var_def has some strange code whereby it creates
debug temporaries for SINGLE_RHS (RHS for gimple_assign_single_p) but
not for other RHS in the same situation.

gcc/
* tree-ssa.cc (insert_debug_temp_for_var_def): Do not create
superfluous debug temporaries for single GIMPLE assignments.

tree-optimization/109609 - correctly interpret arg size in fnspec

By majority vote and a hint from the API name which is
arg_max_access_size_given_by_arg_p this interprets a memory access
size specified as given as other argument such as for strncpy
in the testcase which has "1cO313" as specifying the _maximum_
size read/written rather than the exact size. There are two
uses interpreting it that way already and one differing. The
following adjusts the differing and clarifies the documentation.

PR tree-optimization/109609
* attr-fnspec.h (arg_max_access_size_given_by_arg_p):
Clarify semantics.
* tree-ssa-alias.cc (check_fnspec): Correctly interpret
the size given by arg_max_access_size_given_by_arg_p as
maximum, not exact, size.

* gcc.dg/torture/pr109609.c: New testcase.

'omp scan' struct block seq update for OpenMP 5.x

While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or
more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but
till requires the '{' ... '}' for the final-loop-body) and updates Fortran
to accept zero or more than one statements.

If there is no preceeding or succeeding executable statement, a warning is
shown.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.

gcc/fortran/ChangeLog:

* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
or more than one exec statements before/after 'omp scan'.
* trans-openmp.cc (gfc_trans_omp_do): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/scan-1.c: New test.
* testsuite/libgomp.c/scan-23.c: New test.
* testsuite/libgomp.fortran/scan-2.f90: New test.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning.
* gfortran.dg/gomp/loop-2.f90: Likewise.
* gfortran.dg/gomp/reduction5.f90: Likewise.
* gfortran.dg/gomp/reduction6.f90: Likewise.
* gfortran.dg/gomp/scan-1.f90: Likewise.
* gfortran.dg/gomp/taskloop-2.f90: Likewise.
* c-c++-common/gomp/scan-6.c: New test.
* gfortran.dg/gomp/scan-8.f90: New test.

testsuite: Fix up ext-floating2.C on powerpc64-linux

Another testcase that is failing on powerpc64-linux.  The test expects
a diagnostics when float64 && float128 or in another spot when
float32 && float128.  Now, float128 effective target is satisfied on
powerpc64-linux, despite __CPP_FLOAT128_T__ not being defined, because
one needs to add some extra options for it.  I think 32-bit arm has
similar case for float16.

2023-04-25  Jakub Jelinek  <jakub@redhat.com>

* g++.dg/cpp23/ext-floating2.C: Add dg-add-options for
float16, float32, float64 and float128.

aarch64: PR target/PR99195 Annotate more simple integer binary patterns with vcz subst rules

This patch adds more straightforward annotations to some more integer binary ops to
eliminate redundant fmovs around 64-bit SIMD results.

Bootstrapped and tested on aarch64-none-linux.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (orn<mode>3): Rename to...
(orn<mode>3<vczle><vczbe>): ... This.
(bic<mode>3): Rename to...
(bic<mode>3<vczle><vczbe>): ... This.
(<su><maxmin><mode>3): Rename to...
(<su><maxmin><mode>3<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for orn, bic, max and min.

aarch64: Implement V2DI,V4SI division optabs for TARGET_SVE

Similar to the mulv2di case, we can use SVE instruction to implement the V4SI and V2DI optabs
for signed and unsigned integer division.
This allows us to generate much cleaner code for the testcase than the current:
food:
        fmov    x1, d1
        fmov    x0, d0
        umov    x2, v0.d[1]
        sdiv    x0, x0, x1
        umov    x1, v1.d[1]
        sdiv    x1, x2, x1
        fmov    d0, x0
        ins     v0.d[1], x1
        ret
which now becomes:
food:
        ptrue   p0.b, all
        sdiv    z0.d, p0/m, z0.d, z1.d
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (<su_optab>div<mode>3): New define_expand.
* config/aarch64/iterators.md (VQDIV): New mode iterator.
(vnx2di): New mode attribute.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve-neon-modes_3.c: New test.

testsuite: Fix up ext-floating15.C tests on powerpc64-linux [PR109278]

I've noticed this test FAILs on powerpc64-linux, with
FAIL: g++.dg/cpp23/ext-floating15.C -std=gnu++98 (test for excess errors)
Excess errors:
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:1: error: variable or field 'bar' declared void
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:6: error: expected primary-expression before '_Float128'
and similarly other std versions.
powerpc64-linux is float128 target, but needs to add some options for it.

Fixed by adding them.

2023-04-25 Jakub Jelinek <jakub@redhat.com>

PR c++/109278
* g++.dg/cpp23/ext-floating15.C: Add dg-add-options float128.

rtl-optimization/109585 - alias analysis typo

When r10-514-gc6b84edb6110dd2b4fb improved access path analysis
it introduced a typo that triggers when there's an access to a
trailing array in the first access path leading to false
disambiguation.

PR rtl-optimization/109585
* tree-ssa-alias.cc (aliasing_component_refs_p): Fix typo.

* gcc.dg/torture/pr109585.c: New testcase.

powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]

The following testcase reduced from newlib ICEs on powerpc-linux,
with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was
added and on the original testcase since some ranger improvements in
GCC 13 made it no longer latent on newlib.
The problem is that the *branch_anddi3_dot define_insn_and_split
relies on the *rotldi3_mask_dot define_insn_and_split being recognized
during splitting.  The rs6000_is_valid_rotate_dot_mask function checks whether
the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in
addition to checking that it is a valid mask also has
  (<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff)
test in the condition.  For TARGET_64BIT that doesn't add any further
requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND
second operand is larger than INT_MAX it will not be recognized.

The rs6000_is_valid_rotate_dot_mask function is used solely in one spot,
condition of *branch_anddi3_dot, so the following patch adjusts it
to check for that as well.

2023-04-25  Jakub Jelinek  <jakub@redhat.com>

PR target/109566
* config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For
!TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb)
is larger than signed int maximum.

* gcc.target/powerpc/pr109566.c: New test.

gcov: add info about "calls" to JSON output format

gcc/ChangeLog:

* doc/gcov.texi: Document the new "calls" field and document
the API bump. Mention also "block_ids" for lines.
* gcov.cc (output_intermediate_json_line): Output info about
calls and extend branches as well.
(generate_results): Bump version to 2.
(output_line_details): Use block ID instead of a non-sensual
index.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
* g++.dg/gcov/test-gcov-17.py: Cover new format.
* lib/gcov.exp: Add options for gcov that emit the extra info.

[Committed] Correct zeroextendqihi2 insn length regression on xstormy16.

My recent tweak to the zeroextendqihi2 pattern on xstormy16 incorrectly
handled the case where the operand was a MEM.  MEM operands use a longer
encoding than REG operands, and the incorrect instruction length resulted
in assembler errors (as reported by Jeff Law).  This patch restores the
original length resolving this regression.  Sorry for the inconvenience.
Committed as obvious, after testing that a cross-compiler to xstormy16-elf
builds from x86_64-pc-linux-gnu, and that gcc.c-torture/execute/memset-2.c
no longer causes "operand out of range" issues in gas.  Committed as
obvious.

2023-04-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (zero_extendqihi2): Restore/fix
length attribute for the first (memory operand) alternative.

aarch64: Leveraging the use of STP instruction for vec_duplicate

The backend pattern for storing a pair of identical values in 32 and
64-bit modes with the machine instruction STP was missing, and
multiple instructions were needed to reproduce this behavior as a
result of failed RTL pattern match in combine pass.

For the test case:

typedef long long v2di __attribute__((vector_size (16)));
typedef int v2si __attribute__((vector_size (8)));

void
foo (v2di *x, long long a)
{
  v2di tmp = {a, a};
  *x = tmp;
}

void
foo2 (v2si *x, int a)
{
  v2si tmp = {a, a};
  *x = tmp;
}

at -O2 on aarch64 gives:

foo:
    stp x1, x1, [x0]
    ret
foo2:
    stp w1, w1, [x0]
    ret

instead of:

foo:
        dup     v0.2d, x1
        str     q0, [x0]
        ret
foo2:
        dup     v0.2s, w1
        str     d0, [x0]
        ret

Bootstrapped and regtested on aarch64-none-linux-gnu.

gcc/
* config/aarch64/aarch64-simd.md(aarch64_simd_stp<mode>): New.
* config/aarch64/constraints.md: Make "Umn" relaxed memory
constraint.
* config/aarch64/iterators.md(ldpstp_vel_sz): New.

gcc/testsuite/
* gcc.target/aarch64/stp_vec_dup_32_64-1.c: New.

Remove default constructor to nan_state.

I think it's best to specify the default behavior of nan_state, since
it's not obvious that nan_state() defaults to TRUE. Also, this avoids
the ugly nan_state(false, false) idiom.

gcc/ChangeLog:

* value-range.cc (frange::set): Adjust constructor.
* value-range.h (nan_state::nan_state): Replace default
constructor with one taking an argument.

MAINTAINERS: add myself to write after approval

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.

Remove obsolete configure code in gnattools

It was recently pointed out that we generate symbolic links to ghost files
when building the GNAT tools, as the mlib-tgt-specific-*.adb files are gone.

gnattools/
* configure.ac (TOOLS_TARGET_PAIRS): Remove obsolete settings.
(EXTRA_GNATTOOLS): Likewise.
* configure: Regenerate.

Pass correct type to irange::contains_p() in ipa-cp.cc.

There is a call to contains_p() in ipa-cp.cc which passes incompatible
types. This currently works because deep in the call chain, the legacy
code uses tree_int_cst_lt which performs the operation with
widest_int. With the upcoming removal of legacy, contains_p() will be
stricter.

gcc/ChangeLog:

* ipa-cp.cc (ipa_range_contains_p): New.
(decide_whether_version_node): Use it.

[PATCH v2] testsuite: Add testcase for sparc ICE [PR105573]

r11-10018-g33914983cf3734c2f8079963ba49fcc117499ef3 fixed PR105312 and added
a test case for target/arm but the duplicate PR105573 has a test case for
target/sparc that was uncommitted until now.

2023-04-21 Sam James <sam@gentoo.org>

PR tree-optimization/105312
PR target/105573
gcc/testsuite/
* gcc.target/sparc/pr105573.c: New test.

Add alternative testcase of phi-opt-25.c that tests phiopt

Right now phi-opt-25.c has tests like `a ? func(a) : CST`
but if we add the simplifications to match.pd, then phi-opt-25.c
will no longer be testing phiopt to make sure these get optimized.
So this adds an alternative version which is designed to test
phiopt.

Committed as obvious after testing the testcase to make sure it does not
fail on x86_64-linux-gnu.

Thanks,
Andrew Pinski

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-25a.c: New test.

Daily bump.

[SVE] Fold svrev(svrev(v)) to v.

gcc/ChangeLog:
* tree-ssa-forwprop.cc (is_combined_permutation_identity): Try to
simplify two successive VEC_PERM_EXPRs with same VLA mask,
where mask chooses elements in reverse order.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/rev-1.c: New test.

Update gcc hr.po, sv.po, zh_CN.po

* hr.po, sv.po, zh_CN.po: Update.

libstdc++: Fix __max_diff_type::operator>>= for negative values

This patch fixes sign bit propagation when right-shifting a negative
__max_diff_type value by more than one, a bug that our existing test
coverage didn't expose until r14-159-g03cebd304955a6 fixed the front
end's 'signed typedef-name' handling that the test relies on (which is
a non-standard extension to the language grammar).

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h (__max_diff_type::operator>>=):
Fix propagation of sign bit.
* testsuite/std/ranges/iota/max_size_type.cc: Avoid using the
non-standard 'signed typedef-name'. Add some compile-time tests
for right-shifting a negative __max_diff_type value by more than
one.

PHIOPT: Add support for diamond shaped bb to match_simplify_replacement

This adds diamond shaped form of basic blocks to match_simplify_replacement.
This is the patch is the start of removing/moving all
of what minmax_replacement does to match.pd to reduce the code duplication.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note phi-opt-{23,24}.c testcase had an incorrect xfail as there should
have been 2 if still because f4/f5 would not be transformed as -ABS is
not allowable during early phi-opt.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement): Add new arguments
and support diamond shaped basic block form.
(tree_ssa_phiopt_worker): Update call to match_simplify_replacement

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-23.c: Update testcase.
* gcc.dg/tree-ssa/phi-opt-24.c: Likewise.

PHIOPT: Ignore predicates for match-and-simplify phi-opt

This fixes a missed optimization where early phi-opt would
not work when there was predicates. The easiest fix is
to change empty_bb_or_one_feeding_into_p to ignore those
statements while checking for only feeding statement.

Note phi-opt-23.c and phi-opt-24.c still fail as we don't handle
diamond form in match_and_simplify phiopt yet.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Instead of calling last_and_only_stmt, look for the last statement
manually.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-13.c: Add -fno-ssa-phiopt.

PHIOPT: Factor out some code from match_simplify_replacement

This factors out the code checking if we have an empty bb
or one statement that feeds into the phi so it can be used
when adding diamond shaped bb form to match_simplify_replacement
in the next patch. Also allows for some improvements
in the next patches too.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
New function.
(match_simplify_replacement): Call
empty_bb_or_one_feeding_into_p instead of doing it inline.

PHIOPT: Allow other diamond uses when do_hoist_loads is true

While working on adding diamond shaped form to match-and-simplify
phiopt, I Noticed that we would not reach there if do_hoist_loads
was true. In the original code before the cleanups it was not
obvious why but after I finished the cleanups, it was just a matter
of removing a continue and that is what this patch does.

This just happens also to fix a bug report that I noticed too.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/68894
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove the
continue for the do_hoist_loads diamond case.

PHIOPT: Cleanup tree_ssa_phiopt_worker code

This patch cleans up tree_ssa_phiopt_worker by merging
common code. Making do_store_elim handled earlier.
Note this does not change any overall logic of the code,
just moves code around enough to be able to do this.
This will make it easier to move code around even more
and a few other fixes I have.
Plus I think all of the do_store_elim code really
should move to its own function as how much code is shared
is now obvious not much.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Rearrange
code for better code readability.

PHIOPT: Move check on diamond bb to tree_ssa_phiopt_worker from minmax_replacement

This moves the check to make sure on the diamond shaped form bbs that
the the two middle bbs are only for that diamond shaped form earlier
in the shared code.
Also remove the redundant check for single_succ_p since that was already
done before hand.
The next patch will simplify the code even further and remove redundant
checks.

PR tree-optimization/109604

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the
diamond form check from ...
(minmax_replacement): Here.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr109604-1.c: New test.
* gcc.c-torture/compile/pr109604-2.c: New test.

c++, tree: declare some basic functions inline

The functions strip_array_types, is_typedef_decl, typedef_variant_p
and cp_expr_location are used throughout the C++ front end including in
some fairly hot parts (e.g. in the tsubst routines and cp_walk_subtree)
and they're small enough that the overhead of calling them out-of-line
is relatively significant.

So this patch moves their definitions into the appropriate headers to
enable inlining them.

gcc/cp/ChangeLog:

* cp-tree.h (cp_expr_location): Define here.
* tree.cc (cp_expr_location): Don't define here.

gcc/ChangeLog:

* tree.cc (strip_array_types): Don't define here.
(is_typedef_decl): Don't define here.
(typedef_variant_p): Don't define here.
* tree.h (strip_array_types): Define here.
(is_typedef_decl): Define here.
(typedef_variant_p): Define here.

Docs, OpenMP: Small fixes to internal OMP_FOR doc.

gcc/ChangeLog:

* doc/generic.texi (OpenMP): Add != to allowed
conditions and state that vars can be unsigned.

* tree.def (OMP_FOR): Likewise.

aarch64: Add mulv2di3 expander for TARGET_SVE

Motivated by a recent LLVM patch I saw, we can use SVE for 64-bit vector integer MUL (plain Advanced SIMD doesn't support it).
Since the Advanced SIMD regs are just the low 128-bit part of the SVE regs it all works transparently.
It's a reasonably straightforward implementation of the mulv2di3 optab that wires it up through the mulvnx2di3 expander and
subregs the results back to the Advanced SIMD modes.

There's more such tricks possible with other operations (and we could do 64-bit multiply-add merged operations too) but for now
this self-contained patch improves the mul case as without it for the testcases in the patch we'd have scalarised the arguments,
moved them to GP regs, performed two GP MULs and moved them back to SIMD regs.
Advertising a mulv2di3 optab from the backend should also allow for more flexibile vectorisation opportunities.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (mulv2di3): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve-neon-modes_1.c: New test.
* gcc.target/aarch64/sve-neon-modes_2.c: New test.

MAINTAINERS: fix sorting of names

ChangeLog:

* MAINTAINERS: Fix sorting.

doc: Update install.texi for GCC 13

install.texi needs some updates for GCC 13 and trunk:

* We used a mixture of Solaris 2 and Solaris references.  Since Solaris
  1/SunOS 4 is ancient history by now, consistently use Solaris
  everywhere.  Likewise, explicit references to Solaris 11 can go in
  many places since Solaris 11.3 and 11.4 is all GCC supports.

* Some caveats apply to both Solaris/SPARC and x86, like the difference
  between as and gas.

* Some specifics are obsolete, like the /usr/ccs/bin path whose contents
  was merged into /usr/bin in Solaris 11.0 already.  Likewise, /bin/sh
  is ksh93 since Solaris 11.0, so there's no need to explicitly use
  /bin/ksh.

* I've removed the reference to OpenCSW: there's barely a need for external
  sites to get additional packages.  OpenCSW is mostly unmaintained these
  days and has been found to be rather harmful then helping.

* The section on assembler and linker to use was partially duplicated.
  Better keep the info in one place.

* GNAT is bundled in recent Solaris 11.4 updates, so recommend that.

Tested on i386-pc-solaris2.11 with make doc/gccinstall.{info,pdf} and
inspection of the latter.

2023-04-21  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* doc/install.texi: Consistently use Solaris rather than Solaris 2.
Remove explicit Solaris 11 references.
Markup fixes.
(Options specification, --with-gnu-as): as and gas always differ
on Solaris.
Remove /usr/ccs/bin reference.
(Installing GCC: Binaries, Solaris (SPARC, Intel)): Remove.
(i?86-*-solaris2*): Merge assembler, linker recommendations ...
(*-*-solaris2*): ... here.
Update bundled GCC versions.
Don't refer to pre-built binaries.
Remove /bin/sh warning.
Update assembler, linker recommendations.
Document GNAT bootstrap compiler.
(sparc-sun-solaris2*): Remove non-UltraSPARC reference.
(sparc64-*-solaris2*): Move content...
(sparcv9-*-solaris2*): ...here.
Add GDC for 64-bit bootstrap compilers.

aarch64: PR target/109406 Add support for SVE2 unpredicated MUL

SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders
without using up a predicate registers. This patch does so.
As the SVE MUL expansion currently is templated away through a code iterator I did not split it
off just for this case but instead special-cased it in the define_expand. It seemed somewhat less
invasive than the alternatives but I could split it off more explicitly if others want to.
The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

PR target/109406
* config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL
case.
* config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New
pattern.

gcc/testsuite/ChangeLog:

PR target/109406
* gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2
MUL.
* gcc.target/aarch64/sve2/unpred_mul_1.c: New test.

[4/4] aarch64: Convert UABAL2 and SABAL2 patterns to standard RTL codes

The final patch in the series tackles the most complex of this family of patterns, UABAL2 and SABAL2.
These extract the high part of the sources, perform an absdiff on them, widen the result and accumulate.
The motivating testcase for this patch (series) is included and the simplification required doesn't actually
trigger with just the RTL pattern change because rtx_costs block it.
So this patch also extends rtx costs to recognise the (minus (smax (x, y) (smin (x, y)))) expression we use
to describe absdiff in the backend and avoid recursing into its arms.

This allows us to generate the single-instruction sequence expected here.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur>abal2<mode>): Rename to...
(aarch64_<su>abal2<mode>_insn): ... This. Use RTL codes instead of unspec.
(aarch64_<su>abal2<mode>): New define_expand.
* config/aarch64/aarch64.cc (aarch64_abd_rtx_p): New function.
(aarch64_rtx_costs): Handle ABD rtxes.
* config/aarch64/aarch64.md (UNSPEC_SABAL2, UNSPEC_UABAL2): Delete.
* config/aarch64/iterators.md (ABAL2): Delete.
(sur): Remove handling of UNSPEC_UABAL2 and UNSPEC_SABAL2.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vabal_combine.c: New test.

[3/4] aarch64: Convert UABAL and SABAL patterns to standard RTL codes

With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated.
There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done.
Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to...
(aarch64_<su>abal<mode>): ... This.  Use RTL codes instead of unspec.
(<sur>sadv16qi): Rename to...
(<su>sadv16qi): ... This.  Adjust for the above.
* config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to...
(<su>sad<vsi2qi>): ... This.  Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete.
* config/aarch64/iterators.md (ABAL): Delete.
(sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.

[2/4] aarch64: Convert UABDL2 and SABDL2 patterns to standard RTL codes

Similar to the previous patch for UABDL and SABDL, this patch covers the *2 versions that vec_select the high half
of its input to do the asbsdiff and extend. A define_expand is added for the intrinsic to create the "select-high-half" RTX the pattern expects.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>): Rename to...
(aarch64_<su>abdl2<mode>_insn): ... This. Use RTL codes instead of unspec.
(aarch64_<su>abdl2<mode>): New define_expand.
* config/aarch64/aarch64.md (UNSPEC_SABDL2, UNSPEC_UABDL2): Delete.
* config/aarch64/iterators.md (ABDL2): Delete.
(sur): Remove handling of UNSPEC_SABDL2 and UNSPEC_UABDL2.

[1/4] aarch64: Convert UABDL and SABDL patterns to standard RTL codes

This is the first patch in a series to improve the RTL representation of the sum-of-absolute-differences patterns
in the backend. We can use standard RTL codes and remove some unspecs.
For UABDL and SABDL we have a widening of the result so we can represent uabdl (x, y) as (zero_extend (minus (smax (x, y) (smin (x, y)))))
and sabdl (x, y) as (zero_extend (minus (umax (x, y) (umin (x, y))))).
It is important to use zero_extend rather than sign_extend for the sabdl case, as the result of the absolute difference is still a positive unsigned value
(the signedness of the operation refers to the values being diffed, not the absolute value of the difference) that must be zero-extended.

Bootstrapped and tested on aarch64-none-linux-gnu (these intrinsics are reasonably well-covered by the advsimd-intrinsics.exp tests)

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur>abdl<mode>): Rename to...
(aarch64_<su>abdl<mode>): ... This. Use standard RTL ops instead of
unspec.
* config/aarch64/aarch64.md (UNSPEC_SABDL, UNSPEC_UABDL): Delete.
* config/aarch64/iterators.md (ABDL): Delete.
(sur): Remove handling of UNSPEC_SABDL and UNSPEC_UABDL.

aarch64: Add pattern to match zero-extending scalar result of ADDLV

The vaddlv_u8 and vaddlv_u16 intrinsics produce a widened scalar result (uint16_t and uint32_t).
The ADDLV instructions themselves zero the rest of the V register, which gives us a free zero-extension
to 32 and 64 bits, similar to how it works on the GP reg side.
Because we don't model that zero-extension in the machine description this can cause GCC to move the
results of these instructions to the GP regs just to do a (superfluous) zero-extension.
This patch just adds a pattern to catch these cases. For the testcases we can now generate no zero-extends
or GP<->FP reg moves, whereas before we generated stuff like:
foo_8_32:
        uaddlv  h0, v0.8b
        umov    w1, v0.h[0] // FP<->GP move with zero-extension!
        str     w1, [x0]
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/addlv_zext.c: New test.

This replaces uses of last_stmt where we do not require debug skipping

There are quite some cases which want to access the control stmt
ending a basic-block. Since there cannot be debug stmts after
such stmt there's no point in using last_stmt which skips debug
stmts and can be a compile-time hog for larger testcases.

* gimple-ssa-split-paths.cc (is_feasible_trace): Avoid
last_stmt.
* graphite-scop-detection.cc (single_pred_cond_non_loop_exit):
Likewise.
* ipa-fnsummary.cc (set_cond_stmt_execution_predicate): Likewise.
(set_switch_stmt_execution_predicate): Likewise.
(phi_result_unknown_predicate): Likewise.
* ipa-prop.cc (compute_complex_ancestor_jump_func): Likewise.
(ipa_analyze_indirect_call_uses): Likewise.
* predict.cc (predict_iv_comparison): Likewise.
(predict_extra_loop_exits): Likewise.
(predict_loops): Likewise.
(tree_predict_by_opcode): Likewise.
* gimple-predicate-analysis.cc (predicate::init_from_control_deps):
Likewise.
* gimple-pretty-print.cc (dump_implicit_edges): Likewise.
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Likewise.
(replace_phi_edge_with_variable): Likewise.
(two_value_replacement): Likewise.
(value_replacement): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_removal_in_builtin_zero_pattern): Likewise.
* tree-ssa-reassoc.cc (maybe_optimize_range_tests): Likewise.
* tree-ssa-sccvn.cc (vn_phi_eq): Likewise.
(vn_phi_lookup): Likewise.
(vn_phi_insert): Likewise.
* tree-ssa-structalias.cc (compute_points_to_sets): Likewise.
* tree-ssa-threadbackward.cc (back_threader::maybe_thread_block):
Likewise.
(back_threader_profitability::possibly_profitable_path_p):
Likewise.
* tree-ssa-threadedge.cc (jump_threader::thread_outgoing_edges):
Likewise.
* tree-switch-conversion.cc (pass_convert_switch::execute):
Likewise.
(pass_lower_switch<O0>::execute): Likewise.
* tree-tailcall.cc (tree_optimize_tail_calls_1): Likewise.
* tree-vect-loop-manip.cc (vect_loop_versioning): Likewise.
* tree-vect-slp.cc (vect_slp_function): Likewise.
* tree-vect-stmts.cc (cfun_returns): Likewise.
* tree-vectorizer.cc (vect_loop_vectorized_call): Likewise.
(vect_loop_dist_alias_call): Likewise.

Avoid repeated forwarder_block_p calls in CFG cleanup

CFG cleanup maintains BB_FORWARDER_BLOCK and uses FORWARDER_BLOCK_P
to check that apart from two places which use forwarder_block_p
in outgoing_edges_match alongside many BB_FORWARDER_BLOCK uses.

The following adjusts those.

* cfgcleanup.cc (outgoing_edges_match): Use FORWARDER_BLOCK_P.

RISC-V: Eliminate redundant vsetvli for duplicate AVL def

This patch is the V2 patch:https://patchwork.sourceware.org/project/gcc/patch/20230328010124.235703-1-juzhe.zhong@rivai.ai/

Address comments from Jeff. Add comments for all_avail_in_compatible_p and refine comments of codes.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_infos_manager::all_avail_in_compatible_p): New function.
(pass_vsetvl::refine_vsetvls): Optimize vsetvls.
* config/riscv/riscv-vsetvl.h: New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.

RISC-V: Add function comment for cleanup_insns.

Add more comment for cleanup_insns.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::pre_vsetvl): Add function
comment for cleanup_insns.

RISC-V: Optimize fault only first load

V2 patch for: https://patchwork.sourceware.org/project/gcc/patch/20230330012804.110539-1-juzhe.zhong@rivai.ai/
which has been reviewed.

This patch address Jeff's comment, refine ChangeLog to give more
clear information.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: New unspec to refine fault first load pattern.
* config/riscv/vector.md: Refine fault first load pattern to erase avl from instructions
with the fault first load property.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/ffload-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-5.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-6.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-7.c: New test.

Add testcases for ffs/ctz vectorization.

gcc/testsuite/ChangeLog:

PR tree-optimization/109011
* gcc.target/i386/pr109011-b1.c: New test.
* gcc.target/i386/pr109011-b2.c: New test.
* gcc.target/i386/pr109011-d1.c: New test.
* gcc.target/i386/pr109011-d2.c: New test.
* gcc.target/i386/pr109011-q1.c: New test.
* gcc.target/i386/pr109011-q2.c: New test.
* gcc.target/i386/pr109011-w1.c: New test.
* gcc.target/i386/pr109011-w2.c: New test.

Daily bump.

modula2: Add -lnsl -lsocket libraries to gcc/testsuite/lib/gm2.exp

Solaris requires -lnsl -lsocket (present in the driver) but not when
running the testsuite. This patch tests target for *-*-solaris2
and conditionally appends the above libraries.

gcc/testsuite/ChangeLog:

* lib/gm2.exp (gm2_target_compile_default): Conditionally
append -lnsl -lsocket to ldflags.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

aarch64: Annotate fcvtn pattern for vec_concat with zeroes

Using the define_substs in aarch64-simd.md this is a straightforward annotation to remove
a redundant fmov insn.

So the codegen goes from:
foo_d:
        fcvtn   v0.2s, v0.2d
        fmov    d0, d0
        ret

to the simple:
foo_d:
        fcvtn   v0.2s, v0.2d
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_): Rename to...
(aarch64_float_truncate_lo_<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/float_truncate_zero.c: New test.

aarch64: Add vect_concat with zeroes annotation to addp pattern

Similar to others, the addp pattern can be safely annotated with <vczle><vczbe> to create
the implicit vec_concat-with-zero variants.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_addp<mode>): Rename to...
(aarch64_addp<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for vpadd intrinsics.

[xstormy16] Update xstormy16_rtx_costs.

This patch provides an improved rtx_costs target hook on xstormy16.
The current implementation has the unfortunate property that it claims
that zero_extendhisi2 is very cheap, even though the machine description
doesn't provide that instruction/pattern. Doh! Rewriting the
xstormy16_rtx_costs function has additional benefits, including
making more use of the (short) "mul" instruction when optimizing
for size with -Os.

2023-04-23 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.cc (xstormy16_rtx_costs): Rewrite to
provide reasonable values for common arithmetic operations and
immediate operands (in several machine modes).

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/mulhi.c: New test case.

[xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.md

This patch adds a pair of define_insn patterns to the xstormy16 machine
description that provide extendhisi2 and zero_extendhisi2, i.e. 16-bit
to 32-bit sign- and zero-extension respectively.  This functionality is
already synthesized during RTL expansion, but providing patterns allow
the semantics to be exposed to the RTL optimizers.  To simplify things,
this patch introduces a new %h0 output format, for emitting the high_part
register name of a double-word (SImode) register pair.  The actual
code generated is identical to before.

Whilst there, I also fixed the instruction lengths and formatting of
the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
alternative to zero_extendqihi2 that takes advantage of its efficient
instruction encoding.

2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
format specifier to output high_part register name of SImode reg.
* config/stormy16/stormy16.md (extendhisi2): New define_insn.
(zero_extendqihi2): Fix lengths, consistent formatting and add
"and Rx,#255" alternative, for documentation purposes.
(zero_extendhisi2): New define_insn.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/extendhisi2.c: New test case.
* gcc.target/xstormy16/zextendhisi2.c: Likewise.

[xstormy16] Improved SImode shifts by two bits.

Currently on xstormy16 SImode shifts by a single bit require two
instructions, and shifts by other non-zero integer immediate constants
require five instructions.  This patch implements the obvious optimization
that shifts by two bits can be done in four instructions, by using two
single-bit sequences.

Hence, ashift_2 was previously generated as:
        mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,#14 | or r3,r7
        ret
and with this patch we now generate:
        shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1
        ret

2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.cc (xstormy16_output_shift): Implement
SImode shifts by two by performing a single bit SImode shift twice.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/shiftsi.c: New test case.

Handle NANs in frange::operator== [PR109593]

This patch...
commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015
Return true from operator== for two identical ranges containing NAN.

removed the check for NANs, which caused us to read from m_min and
m_max which are undefined for NANs.

gcc/ChangeLog:

PR tree-optimization/109593
* value-range.cc (frange::operator==): Handle NANs.

Adjust testcases after better RA decision.

After optimization for RA, memory op is not propagated into
instructions(>1), and it make testcases not generate vxorps since
the memory is loaded into the dest, and the dest is never unused now.

So rewrite testcases to make the codegen more stable.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite
testcase to make the codegen more stable.
* gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto
* gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto.
* gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto.
* gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto.

Use NO_REGS in cost calculation when the preferred register class are not known yet.

gcc/ChangeLog:

PR rtl-optimization/108707
* ira-costs.cc (scan_one_insn): Use NO_REGS instead of
GENERAL_REGS when preferred reg_class is not known.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108707.c: New test.

Daily bump.

PHIOPT: Improve readability of tree_ssa_phiopt_worker

This small patch just changes around the code slightly to
make it easier to understand that the cases were handling diamond
shaped BB for both do_store_elim/do_hoist_loads.
There is no effect on code output at all since all of the checks
are the same still.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker):
Change the code around slightly to move diamond
handling for do_store_elim/do_hoist_loads out of
the big if/else.

PHIOPT: Improve minmax diamond detection for phiopt1

For diamond bb phi node detection, there is a check
to make sure bb1 is not empty. But in the case where
bb1 is empty except for a predicate, empty_block_p
will still return true but the minmax code handles
that case already so there is no reason to check
if the basic block is empty.

This patch removes that check and removes some
xfails.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker):
Remove check on empty_block_p.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-5.c: Remvoe some xfail.

[Committed] Move new test case to gcc.target/avr/mmcu/pr54816.c

AVR test cases that specify a specific -mmcu option need to be placed
in the gcc.target/avr/mmcu subdirectory. Moved thusly.

2023-04-22 Roger Sayle <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
PR target/54816
* gcc.target/avr/pr54816.c: Move to...
* gcc.target/avr/mmcu/pr54816.c: ... here.

Fortran: function results never have the ALLOCATABLE attribute [PR109500]

Fortran 2018 8.5.3 (ALLOCATABLE attribute) explains in Note 1 that the
result of referencing a function whose result variable has the ALLOCATABLE
attribute is a value that does not itself have the ALLOCATABLE attribute.

gcc/fortran/ChangeLog:

PR fortran/109500
* interface.cc (gfc_compare_actual_formal): Reject allocatable
functions being used as actual argument for allocable dummy.

gcc/testsuite/ChangeLog:

PR fortran/109500
* gfortran.dg/allocatable_function_11.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>

testsuite: Fix up pr109011-*.c tests for powerpc [PR109572]

As reported, pr109011-{4,5}.c tests fail on powerpc.
I thought they should have the same counts as the corresponding -{2,3}.c
tests, the only difference is that -{2,3}.c are int while -{4,5}.c are
long long.  But there are 2 issues.  One is that in the foo
function the vectorization costs comparison triggered in, while in -{2,3}.c
we use vectorization factor 4 and it was found beneficial, when using
long long it was just vf 2 and the scalar cost of doing
p[i] = __builtin_ctzll (q[i]) twice looked smaller than the vectorizated
statements.  I could disable the cost model, but instead chose to add
some further arithmetics to those functions to make it beneficial even
with vf 2.
After that change, pr109011-4.c still failed; I was expecting 4 .CTZ calls
there on power9, 3 vectorized and one in scalar code, but for some reason
the scalar one didn't trigger.  As I really want to count just the
vectorized calls, I've added the vect prefix on the variables to ensure
I'm only counting vectorized calls and decreased the 4 counts to 3.

2023-04-22  Jakub Jelinek  <jakub@redhat.com>

PR testsuite/109572
* gcc.dg/vect/pr109011-1.c: In scan-tree-dump-times regexps match also
vect prefix to make sure we only count vectorized calls.
* gcc.dg/vect/pr109011-2.c: Likewise.  On powerpc* expect just count 3
rather than 4.
* gcc.dg/vect/pr109011-3.c: In scan-tree-dump-times regexps match also
vect prefix to make sure we only count vectorized calls.
* gcc.dg/vect/pr109011-4.c: Likewise.  On powerpc* expect just count 3
rather than 4.
(foo): Add 2 further arithmetic ops to the loop to make it appear
worthwhile for vectorization heuristics on powerpc.
* gcc.dg/vect/pr109011-5.c: In scan-tree-dump-times regexps match also
vect prefix to make sure we only count vectorized calls.
(foo): Add 2 further arithmetic ops to the loop to make it appear
worthwhile for vectorization heuristics on powerpc.

Fix up bootstrap with GCC 4.[89] after RAII auto_mpfr and autp_mpz [PR109589]

On Tue, Apr 18, 2023 at 03:39:41PM +0200, Richard Biener via Gcc-patches wrote:
> The following adds two RAII classes, one for mpz_t and one for mpfr_t
> making object lifetime management easier. Both formerly require
> explicit initialization with {mpz,mpfr}_init and release with
> {mpz,mpfr}_clear.

This unfortunately broke bootstrap when using GCC 4.8.x or 4.9.x as
it uses deleted friends which weren't supported until PR62101 fixed
them in 2014 for GCC 5.

The following patch adds an workaround, not deleting those friends
for those old versions.
While it means if people add those mp*_{init{,2},clear} calls on auto_mp*
objects they won't notice when doing non-bootstrap builds using
very old system compilers, people should be bootstrapping their changes
and it will be caught during bootstraps even when starting with those
old compilers, plus most people actually use much newer compilers
when developing.

2023-04-22 Jakub Jelinek <jakub@redhat.com>

PR bootstrap/109589
* system.h (class auto_mpz): Workaround PR62101 bug in GCC 4.8 and 4.9.
* realmpfr.h (class auto_mpfr): Likewise.

Adjust rx movsicc tests

The rx port has target specific test movsicc which is naturally meant to verify
that if-conversion is happening on the expected cases.

Unfortunately the test is poorly written.  The core problem is there are 8
distinct tests and each of those tests is expected to generate a specific
sequence.  Unfortunately, various generic bits might turn an equality test
into an inequality test or make other similar changes.

The net result is the assembly matching patterns may find a particular sequence,
but it may be for a different function than was originally intended.  ie,
test1's output may match the expected assembly for test5.  Ugh!

This patch breaks the movsicc test down into 8 distinct tests and adjusts the
patterns they match.  The nice thing is all these tests are supposed to have
branches that use a bCC 1f form.  So we can make them a bit more robust by
ignoring the actual condition code used.  So if we change eq to ne, as long
as we match the movsicc pattern, we're OK.  And the 1f style is only used by
the movsicc pattern.

With the tests broken down it's a lot easier to diagnose why one test fails
after the recent changes to if-conversion.  movsicc-3 fails because of the
profitability test.  It's more expensive than the other cases because of its
use of (const_int 10) rather than (const_int 0).  (const_int 0) naturally has
a smaller cost.

It looks to me like in this context (const_int 10) should have the same cost
as (const_int 0).  But I'm nowhere near well versed in the cost model for the
rx port.  So I'm just leaving the test as xfailed.  If someone cares enough,
they can dig into it further.

gcc/testsuite
* gcc.target/rx/movsicc.c: Broken down into ...
* gcc.target/rx/movsicc-1.c: Here.
* gcc.target/rx/movsicc-2.c: Here.
* gcc.target/rx/movsicc-3.c: Here.  xfail one test.
* gcc.target/rx/movsicc-4.c: Here.
* gcc.target/rx/movsicc-5.c: Here.
* gcc.target/rx/movsicc-6.c: Here.
* gcc.target/rx/movsicc-7.c: Here.
* gcc.target/rx/movsicc-8.c: Here.

match.pd: Fix fneg/fadd optimization [PR109583]

The following testcase ICEs on x86, foo function since my r14-22
improvement, but bar already since r13-4122.  The problem is the same,
in the if expression related_vector_mode is called and that starts with
  gcc_assert (VECTOR_MODE_P (vector_mode));
but nothing in the fneg/fadd match.pd pattern actually checks if the
VEC_PERM type has VECTOR_MODE_P (vec_mode).  In this case it has BLKmode
and so it ICEs.

The following patch makes sure we don't ICE on it.

2023-04-22  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109583
* match.pd (fneg/fadd simplify): Don't call related_vector_mode
if vec_mode is not VECTOR_MODE_P.

* gcc.dg/pr109583.c: New test.

Update loop estimate after header duplication

Loop header copying implements partial loop peelng.  If all exits of the loop
are peeled (which is the common case) the number of iterations decreases by 1.
Without noting this, for loops iterating zero times, we end up re-peeling them
later in the loop peeling pass which is wasteful.

This patch commonizes the code for esitmate update and adds logic to detect
when all (likely) exits were peeled by loop-ch.

We are still wrong about update of estimate however: if the exits behave
randomly with given probability, loop peeling does not decrease expected
iteration counts, just decreases probability that loop will be executed.
In this case we thus incorrectly decrease any_estimate. Doing so however
at least help us to not peel or optimize hard the lop later.

If the loop iterates precisely the estimated nuner of iterations. the estimate
decreases, but we are wrong about decreasing the header frequncy.  We already
have logic that tries to prove that loop exit will not be taken in peeled out
iterations and it may make sense to special case this.

I also fixed problem where we had off by one error in iteration count updating.
It makes perfect sense to expect loop to have 0 iterations.  However if bounds
drops to negative, we lose info about the loop behaviour (since we have no
profile data reaching the loop body).

Bootstrapped/regtested x86_64-linux, comitted.
Honza

gcc/ChangeLog:

2023-04-22  Jan Hubicka  <hubicka@ucw.cz>
    Ondrej Kubanek  <kubanek0ondrej@gmail.com>

* cfgloopmanip.h (adjust_loop_info_after_peeling): Declare.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix updating of
loop profile and bounds after header duplication.
* tree-ssa-loop-ivcanon.cc (adjust_loop_info_after_peeling):
Break out from try_peel_loop; fix handling of 0 iterations.
(try_peel_loop): Use adjust_loop_info_after_peeling.

gcc/testsuite/ChangeLog:

2023-04-22  Jan Hubicka  <hubicka@ucw.cz>
    Ondrej Kubanek  <kubanek0ondrej@gmail.com>

* gcc.dg/tree-ssa/peel1.c: Decrease number of peels by 1.
* gcc.dg/unroll-8.c: Decrease loop iteration estimate.
* gcc.dg/tree-prof/peel-2.c: New test.

Daily bump.

Do not fold ADDR_EXPR conditions leading to builtin_unreachable early.

Ranges can not represent &var globally yet, so we cannot fold these
expressions early or we lose the __builtin_unreachable information.

PR tree-optimization/109546
gcc/
* tree-vrp.cc (remove_unreachable::remove_and_update_globals): Do
not fold conditions with ADDR_EXPR early.

gcc/testsuite/
* gcc.dg/pr109546.c: New.

c++: fix 'unsigned typedef-name' extension [PR108099]

In the comments for PR108099 Jakub provided some testcases that demonstrated
that even before the regression noted in the patch we were getting the
semantics of this extension wrong: in the unsigned case we weren't producing
the corresponding standard unsigned type but another distinct one of the
same size, and in the signed case we were just dropping it on the floor and
not actually returning a signed type at all.

The former issue is fixed by using c_common_signed_or_unsigned_type instead
of unsigned_type_for, and the latter issue by adding a (signed_p &&
typedef_decl) case.

This patch introduces a failure on std/ranges/iota/max_size_type.cc due to
the latter issue, since the testcase expects 'signed rep_t' to do something
sensible, and previously we didn't. Now that we do, it exposes a bug in the
__max_diff_type::operator>>= handling of sign extension: when we evaluate
-1000 >> 2 in __max_diff_type we keep the MSB set, but leave the
second-most-significant bit cleared.

PR c++/108099

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Don't clear typedef_decl after 'unsigned
typedef' pedwarn. Use c_common_signed_or_unsigned_type. Also
handle 'signed typedef'.

gcc/testsuite/ChangeLog:

* g++.dg/ext/int128-8.C: Remove xfailed dg-bogus markers.
* g++.dg/ext/unsigned-typedef2.C: New test.
* g++.dg/ext/unsigned-typedef3.C: New test.

configure: Only create serdep.tmp if needed

There's no reason to create this file if none of the serial configure
options are passed.

ChangeLog:

* configure: Regenerate.
* configure.ac: Only create serdep.tmp if needed

gcc/m2: Drop references to $(P)

$(P) seems to have been a workaround for some old, proprietary make
implementations that we no longer support. It was removed in
r0-31149-gb8dad04b688e9c.

gcc/m2/ChangeLog:

* Make-lang.in: Remove references to $(P).
* Make-maintainer.in: Ditto.

Adjust x86 testsuite for recent if-conversion cost checking

gcc/testsuite
PR testsuite/109549
* gcc.target/i386/cmov6.c: No longer expect this test to
generate 'cmov' instructions.

aarch64: Emit single-instruction for smin (x, 0) and smax (x, 0)

Motivated by https://reviews.llvm.org/D148249, we can expand to a single instruction
for the SMIN (x, 0) and SMAX (x, 0) cases using the combined AND/BIC and ASR operations.
Given that we already have well-fitting TARGET_CSSC patterns and expanders for the min/max codes
in the backend this patch does some minor refactoring to ensure we emit the right SMAX/SMIN RTL codes
for TARGET_CSSC, fall back to the generic expanders or emit a simple SMIN/SMAX with 0 RTX for !TARGET_CSSC
that is now matched by a separate pattern.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_umax<mode>3_insn): Delete.
(umax<mode>3): Emit raw UMAX RTL instead of going through gen_ function
for umax.
(<optab><mode>3): New define_expand for MAXMIN_NOUMAX codes.
(*aarch64_<optab><mode>3_zero): Define.
(*aarch64_<optab><mode>3_cssc): Likewise.
* config/aarch64/iterators.md (maxminand): New code attribute.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sminmax-asr_1.c: New test.

PR target/108779 aarch64: Implement -mtp= option

A user has requested that we support the -mtp= option in aarch64 GCC for changing
the TPIDR register to read for TLS accesses. I'm not a big fan of the option name,
but we already support it in the arm port and Clang supports it for AArch64 already,
where it accepts the 'el0', 'el1', 'el2', 'el3' values.

This patch implements the same functionality in GCC.

Bootstrapped and tested on aarch64-none-linux-gnu.
Confirmed with godbolt that the sequences and options are the same as what Clang accepts/generates.

gcc/ChangeLog:

PR target/108779
* config/aarch64/aarch64-opts.h (enum aarch64_tp_reg): Define.
* config/aarch64/aarch64-protos.h (aarch64_output_load_tp):
Define prototype.
* config/aarch64/aarch64.cc (aarch64_tpidr_register): Declare.
(aarch64_override_options_internal): Handle the above.
(aarch64_output_load_tp): New function.
* config/aarch64/aarch64.md (aarch64_load_tp_hard): Call
aarch64_output_load_tp.
* config/aarch64/aarch64.opt (aarch64_tp_reg): Define enum.
(mtp=): New option.
* doc/invoke.texi (AArch64 Options): Document -mtp=.

gcc/testsuite/ChangeLog:

PR target/108779
* gcc.target/aarch64/mtp.c: New test.
* gcc.target/aarch64/mtp_1.c: New test.
* gcc.target/aarch64/mtp_2.c: New test.
* gcc.target/aarch64/mtp_3.c: New test.
* gcc.target/aarch64/mtp_4.c: New test.

aarch64: PR target/99195 Add scheme to optimise away vec_concat with zeroes on 64-bit Advanced SIMD ops

I finally got around to trying out the define_subst approach for PR target/99195.
The problem we have is that many Advanced SIMD instructions have 64-bit vector variants that
clear the top half of the 128-bit Q register. This would allow the compiler to avoid generating
explicit zeroing instructions to concat the 64-bit result with zeroes for code like:
vcombine_u16(vadd_u16(a, b), vdup_n_u16(0))
We've been getting user reports of GCC missing this optimisation in real world code, so it's worth
doing something about it.
The straightforward approach that we've been taking so far is adding extra patterns in aarch64-simd.md
that match the 64-bit result in a vec_concat with zeroes. Unfortunately for big-endian the vec_concat
operands to match have to be the other way around, so we would end up adding two extra define_insns.
This would lead to too much bloat in aarch64-simd.md

This patch defines a pair of define_subst constructs that allow us to annotate patterns in aarch64-simd.md
with the <vczle> and <vczbe> subst_attrs and the compiler will automatically produce the vec_concat widening patterns,
properly gated for BYTES_BIG_ENDIAN when needed. This seems like the least intrusive way to describe the extra zeroing semantics.

I've had a look at the generated insn-*.cc files in the build directory and it seems that define_subst does what we want it to do
when applied multiple times on a pattern in terms of insn conditions and modes.

This patch adds the define_subst machinery and adds the annotations to some of the straightforward binary and unary integer
operations. Many more such annotations are possible and I aim add them in future patches if this approach is acceptable.

Bootstrapped and tested on aarch64-none-linux-gnu and on aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Define.
(add_vec_concat_subst_be): Likewise.
(vczle): Likewise.
(vczbe): Likewise.
(add<mode>3): Rename to...
(add<mode>3<vczle><vczbe>): ... This.
(sub<mode>3): Rename to...
(sub<mode>3<vczle><vczbe>): ... This.
(mul<mode>3): Rename to...
(mul<mode>3<vczle><vczbe>): ... This.
(and<mode>3): Rename to...
(and<mode>3<vczle><vczbe>): ... This.
(ior<mode>3): Rename to...
(ior<mode>3<vczle><vczbe>): ... This.
(xor<mode>3): Rename to...
(xor<mode>3<vczle><vczbe>): ... This.
* config/aarch64/iterators.md (VDZ): Define.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: New test.

c++, tree: optimize walk_tree_1 and cp_walk_subtrees

These functions currently repeatedly dereference tp during the subtree
walks, dereferences which the compiler can't CSE because it can't
guarantee that the subtree walking doesn't modify *tp.

But we already implicitly require that TREE_CODE (*tp) remains the same
throughout the subtree walks, so it doesn't seem to be a huge leap to
strengthen that to requiring *tp remains the same.

So this patch manually CSEs the dereferences of *tp. This means that a
callback function can no longer replace *tp with another tree (of the
same TREE_CODE) when walking one of its subtrees, but that doesn't sound
like a useful capability anyway.

gcc/cp/ChangeLog:

* tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp.
<case DECLTYPE_TYPE>: Use cp_unevaluated and WALK_SUBTREE.
<case ALIGNOF_EXPR etc>: Likewise.

gcc/ChangeLog:

* tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp
and type_p.

Add Ajit Kumar Agarwal to write after approval

2023-04-21 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.

Fix boostrap failure in tree-ssa-loop-ch.cc

I managed to mix up patch and its WIP version in previous commit.
This patch adds the missing edge iterator and also fixes a side
case where new loop header would have multiple latches.

gcc/ChangeLog:

* tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix previous
commit.

expansion: make layout of x_shift*cost[][][] more efficient

when debugging expmed.[ch] for PR/108987 saw that some of the cost arrays have
less than ideal layout as follows:

x_shift*cost[0..63][speed][modes]

We would want speed to be first index since a typical compile will have
that fixed, followed by mode and then the shift values.

It should be non-functional from compiler semantics pov, except
executing slightly faster due to better locality of shift values for
given speed and mode. And also a bit more intutive when debugging.

gcc/Changelog:

* expmed.h (x_shift*_cost): convert to int [speed][mode][shift].
(shift*_cost_ptr ()): Access x_shift*_cost array directly.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

MAINTAINERS: add Vineet Gupta to write after approval

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
Ref: <680c7bbe-5d6e-07cd-8468-247afc65e1dd@gmail.com>

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

[aarch64] Use force_reg instead of copy_to_mode_reg.

Use force_reg instead of copy_to_mode_reg in aarch64_simd_dup_constant
and aarch64_expand_vector_init to avoid creating pseudo if original value
is already in a register.

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_simd_dup_constant): Use
force_reg instead of copy_to_mode_reg.
(aarch64_expand_vector_init): Likewise.

i386: Remove REG_OK_FOR_INDEX/REG_OK_FOR_BASE and their derivatives

x86 was converted to TARGET_LEGITIMATE_ADDRESS_P long ago.  Remove
remnants of the conversion.  Also, cleanup the remaining macros a bit
by introducing INDEX_REGNO_P macro.

No functional change.

gcc/ChangeLog:

2023-04-21  Uroš Bizjak  <ubizjak@gmail.com>

* config/i386/i386.h (REG_OK_FOR_INDEX_P, REG_OK_FOR_BASE_P): Remove.
(REG_OK_FOR_INDEX_NONSTRICT_P,  REG_OK_FOR_BASE_NONSTRICT_P): Ditto.
(REG_OK_FOR_INDEX_STRICT_P, REG_OK_FOR_BASE_STRICT_P): Ditto.

(FIRST_INDEX_REG, LAST_INDEX_REG): New defines.
(LEGACY_INDEX_REG_P, LEGACY_INDEX_REGNO_P): New macros.
(INDEX_REG_P, INDEX_REGNO_P): Ditto.

(REGNO_OK_FOR_INDEX_P): Use INDEX_REGNO_P predicates.

(REGNO_OK_FOR_INDEX_NONSTRICT_P): New macro.
(EG_OK_FOR_BASE_NONSTRICT_P): Ditto.

* config/i386/predicates.md (index_register_operand):
Use REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros.

* config/i386/i386.cc (ix86_legitimate_address_p): Use
REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_BASE_NONSTRICT_P,
REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros.

Fix latent bug in loop header copying which forgets to update the loop header pointer

gcc/ChangeLog:

2023-04-21 Jan Hubicka <hubicka@ucw.cz>
Ondrej Kubanek <kubanek0ondrej@gmail.com>

* tree-ssa-loop-ch.cc (ch_base::copy_headers): Update loop header and
latch.

Add safe_is_a

The following adds safe_is_a, an is_a check handling nullptr
gracefully.

* is-a.h (safe_is_a): New.

Add operator* to gimple_stmt_iterator and gphi_iterator

This allows STL style iterator dereference. It's the same
as gsi_stmt () or .phi ().

* gimple-iterator.h (gimple_stmt_iterator::operator*): Add.
(gphi_iterator::operator*): Likewise.

Stabilize inliner

The Fibonacci heap can change its behaviour quite significantly for no good
reasons when multiple edges with same key occurs.  This is quite common
for small functions.

This patch stabilizes the order by adding edge uids into the info.
Again I think this is good idea regardless of the incremental WPA project
since we do not want random changes in inline decisions.

gcc/ChangeLog:

2023-04-21  Jan Hubicka  <hubicka@ucw.cz>
    Michal Jires  <michal@jires.eu>

* ipa-inline.cc (class inline_badness): New class.
(edge_heap_t, edge_heap_node_t): Use inline_badness for badness instead
of sreal.
(update_edge_key): Update.
(lookup_recursive_calls): Likewise.
(recursive_inlining): Likewise.
(add_new_edges_to_heap): Likewise.
(inline_small_functions): Likewise.

Cleanup odr_types_equivalent_p

gcc/ChangeLog:

2023-04-21 Jan Hubicka <hubicka@ucw.cz>

* ipa-devirt.cc (odr_types_equivalent_p): Cleanup warned checks.

PR modula2/109586 cc1gm2 ICE when compiling large source files.

The function m2block_RememberConstant calls m2tree_IsAConstant.
However IsAConstant does not recognise TREE_CODE(t) ==
CONSTRUCTOR as a constant. Without this patch CONSTRUCTOR
contants are garbage collected (and not preserved) resulting in
a corrupt tree and crash.

gcc/m2/ChangeLog:

PR modula2/109586
* gm2-gcc/m2tree.cc (m2tree_IsAConstant): Add (TREE_CODE
(t) == CONSTRUCTOR) to expression.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

tree-optimization/109573 - avoid ICEing on unexpected live def

The following relaxes the assert in vectorizable_live_operation
where we catch currently unhandled cases to also allow an
intermediate copy as it happens here but also relax the assert
to checking only.

PR tree-optimization/109573
* tree-vect-loop.cc (vectorizable_live_operation): Allow
unhandled SSA copy as well. Demote assert to checking only.

* g++.dg/vect/pr109573.cc: New testcase.

Use correct CFG orders for DF worklist processing

This adjusts the remaining three RPO computes in DF.  The DF_FORWARD
problems should use a RPO on the forward graph, the DF_BACKWARD
problems should use a RPO on the inverted graph.

Conveniently now inverted_rev_post_order_compute computes a RPO.
We still use post_order_compute and reverse its order for its
side-effect of deleting unreachable blocks.

This resuls in an overall reduction on visited blocks on cc1files by 5.2%.

Because on the reverse CFG most regions are irreducible, there's
few cases the number of visited blocks increases.  For the set
of cc1files I have this is for et-forest.i, graph.i, hwint.i,
tree-ssa-dom.i, tree-ssa-loop-ch.i and tree-ssa-threadedge.i.  For
tree-ssa-dse.i it's off-noise and I've more closely investigated
and figured it is really bad luck due to the irreducibility.

* df-core.cc (df_analyze): Compute RPO on the reverse graph
for DF_BACKWARD problems.
(loop_post_order_compute): Rename to ...
(loop_rev_post_order_compute): ... this, compute a RPO.
(loop_inverted_post_order_compute): Rename to ...
(loop_inverted_rev_post_order_compute): ... this, compute a RPO.
(df_analyze_loop): Use RPO on the forward graph for DF_FORWARD
problems, RPO on the inverted graph for DF_BACKWARD.

change inverted_post_order_compute to inverted_rev_post_order_compute

The following changes the inverted_post_order_compute API back to
a plain C array interface and computing a reverse post order since
that's what's always required.  It will make massaging DF to use
the correct iteration orders easier.  Elsewhere it requires turning
backward iteration over the computed order with forward iteration.

* cfganal.h (inverted_rev_post_order_compute): Rename
from ...
(inverted_post_order_compute): ... this.  Add struct function
argument, change allocation to a C array.
* cfganal.cc (inverted_rev_post_order_compute): Likewise.
* lcm.cc (compute_antinout_edge): Adjust.
* lra-lives.cc (lra_create_live_ranges_1): Likewise.
* tree-ssa-dce.cc (remove_dead_stmt): Likewise.
* tree-ssa-pre.cc (compute_antic): Likewise.

change DF to use the proper CFG order for DF_FORWARD problems

This changes DF to use RPO on the forward graph for DF_FORWARD
problems.  While that naturally maps to pre_and_rev_postorder_compute
we use the existing (wrong) CFG order for DF_BACKWARD problems
computed by post_order_compute since that provides the required
side-effect of deleting unreachable blocks.

The change requires turning the inconsistent vec<int> vs int * back
to consistent int *.  A followup patch will change the
inverted_post_order_compute API and change the DF_BACKWARD problem
to use the correct RPO on the backward graph together with statistics
I produced last year for the combined effect.

* df.h (df_d::postorder_inverted): Change back to int *,
clarify comments.
* df-core.cc (rest_of_handle_df_finish): Adjust.
(df_analyze_1): Likewise.
(df_analyze): For DF_FORWARD problems use RPO on the forward
graph.  Adjust.
(loop_inverted_post_order_compute): Adjust API.
(df_analyze_loop): Adjust.
(df_get_n_blocks): Likewise.
(df_get_postorder): Likewise.

RISC-V: Defer vsetvli insertion to later if possible [PR108270]

Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
  for (int i = 0; i < l; i++){
    for (int j = 0; j < m; j++){
      for (int k = 0; k < n; k++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
  __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
}
    }
  }
}

Compile option: -O3

Before this patch:
mv      a7,a2
mv      a6,a0
mv      t1,a1
mv      a2,a3
vsetivli zero,17,e8,mf8,ta,ma
ble     a7,zero,.L1
ble     a4,zero,.L1
ble     a3,zero,.L1
...

After this patch:
mv      a7,a2
mv      a6,a0
mv      t1,a1
mv      a2,a3
ble     a7,zero,.L1
ble     a4,zero,.L1
ble     a3,zero,.L1
add     a1,a0,a4
li      a0,0
vsetivli zero,17,e8,mf8,ta,ma
...

This issue is a missed optmization produced by Phase 3 global backward demand
fusion instead of LCM.

This patch is fixing poor placement of the vsetvl.

This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info
backward fusion and propogation) which
is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction
performance.

This patch is to supress the Phase 3 too aggressive backward fusion and
propagation to the top of the function program
when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli
instruction allows imm value instead of reg).

You may want to ask why we need Phase 3 to the job.
Well, we have so many situations that pure LCM fails to optimize, here I can
show you a simple case to demonstrate it:

void f (void * restrict in, void * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  for (size_t j = 0; j < m; j++){
    if (cond) {
      for (size_t i = 0; i < n; i++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
  __riscv_vse8_v_i8mf8 (out + i, v, vl);
}
    } else {
      for (size_t i = 0; i < n; i++)
{
  vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
  v = __riscv_vadd_vv_i32mf2 (v,v,vl);
  __riscv_vse32_v_i32mf2 (out + i, v, vl);
}
    }
  }
}

You can see:
The first inner loop needs vsetvli e8 mf8 for vle+vse.
The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.

If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with :

outerloop:
...
vsetvli e8mf8
inner loop 1:
....

vsetvli e32mf2
inner loop 2:
....

However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of
inner loop 2 into vsetvli e8 mf8, then we will end up with this result after
phase 3:

outerloop:
...
inner loop 1:
vsetvli e32mf2
....

inner loop 2:
vsetvli e32mf2
....

Then, this demand information after phase 3 will be well optimized after phase 4
(LCM), after Phase 4 result is:

vsetvli e32mf2
outerloop:
...
inner loop 1:
....

inner loop 2:
....

You can see this is the optimal codegen after current VSETVL PASS (Phase 3:
Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue
when I start to implement VSETVL PASS.

gcc/ChangeLog:

PR target/108270
* config/riscv/riscv-vsetvl.cc
(vector_infos_manager::all_empty_predecessor_p): New function.
(pass_vsetvl::backward_demand_fusion): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

PR target/108270
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
* gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.

riscv: Fix <bitmanip_insn> fallout.

PR109582: Since r14-116 generic.md uses standard names instead of the
types defined in the <bitmanip_insn> iterator (that match instruction
names). Change this.

gcc/ChangeLog:

PR target/109582
* config/riscv/generic.md: Change standard names to insn names.

rs6000: xfail float128 comparison test case that fails on powerpc64.

This patch xfails a float128 comparison test case on powerpc64 that
fails due to a longstanding issue with floating-point compares.

See PR58684 for more information.

When float128 hardware is enabled (-mfloat128-hardware), xscmpuqp is
generated for comparison which is unexpected. When float128 software
emulation is enabled (-mno-float128-hardware), we still have to xfail
the hardware version (__lekf2_hw) which finally generates xscmpuqp.

gcc/testsuite/
PR target/108728
* gcc.dg/torture/float128-cmp-invalid.c: Add xfail.

testsuite: make ppc_cpu_supports_hw as effective target keyword [PR108728]

gcc/testsuite/
PR target/108728
* lib/target-supports.exp (is-effective-target-keyword): Add
ppc_cpu_supports_hw.

Fix LCM dataflow CFG order

The following fixes the initial order the LCM dataflow routines process
BBs. For a forward problem you want reverse postorder, for a backward
problem you want reverse postorder on the inverted graph.

The LCM iteration has very many other issues but this allows to
turn inverted_post_order_compute into computing a reverse postorder
more easily.

* lcm.cc (compute_antinout_edge): Use RPO on the inverted graph.
(compute_laterin): Use RPO.
(compute_available): Likewise.