git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

reload: Handle generating reloads that also clobbers flags

* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from
emit_insn_if_valid_for_reload.
(emit_insn_if_valid_for_reload): Call new helper, and if a SET fails
to be recognized, also try emitting a parallel that clobbers
TARGET_FLAGS_REGNUM, as applicable.

[xstormy16] Efficient HImode rotate left by a single bit.

This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
  return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:    mov r7,r2
        shr r7,#15
        shl r2,#1
        or r2,r7
        ret

with this patch, GCC now generates:
foo: shl r2,#1 | adc r2,#0
        ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
to a define_insn.
(*rotatehi_1): New define_insn for efficient 2 insn sequence.
(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/neghi2.c: New test case.
* gcc.target/xstormy16/rotatehi-1.c: Likewise.

[xstormy16] Recognize/support swpn (swap nibbles) instruction.

This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
        asr r2,#4
        and r2,#15
        mov.w r6,#-256
        and r6,r7
        or r2,r6
        shl r7,#4
        and r7,#255
        or r2,r7
        ret

with this patch, we now generate:
foo: swpn r2
ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.  The naming of the new code iterators is taken from i386.md.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (any_lshift): New code iterator.
(any_or_plus): Likewise.
(any_rotate): Likewise.
(*<any_lshift>_and_internal): New define_insn_and_split to
recognize a logical shift followed by an AND, and split it
again after reload.
(*swpn): New define_insn matching xstormy16's swpn.
(*swpn_zext): New define_insn recognizing swpn followed by
zero_extendqihi2, i.e. with the high byte set to zero.
(*swpn_sext): Likewise, for swpn followed by cbw.
(*swpn_sext_2): Likewise, for an alternate RTL form.
(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
sequence is split in the correct place to recognize the *swpn_zext
followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/swpn-1.c: New QImode test case.
* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
* gcc.target/xstormy16/swpn-4.c: New HImode test case.

add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

PR target/105525 is a build regression for the vax and lm32 linux
targets present in gcc-12/13/head, where the builds fail due to
unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
caused by these two targets failing to provide glibc-stdint.h.

Fixed thusly, tested by building crosses, which now succeeds.

Ok for trunk? (Note I don't have commit rights.)

PR target/105525
gcc/
* config.gcc (vax-*-linux*): Add glibc-stdint.h.
(lm32-*-uclinux*): Likewise.

Adjust mips test for recent ifcvt costing changes

MIPS ports have been failing a few tests since the change to add cost
checks in another path through the if-converter pass.

As with the other ports, these look like cases where we don't do good
costing in the MIPS port.  Someone who cares about MIPS will need to
fix this properly.

In the mean time this patch adjusts the branch cost when running the
two affected tests and skips them at -Os.  This is enough to verify
that if conversion can still happen if the costs are adjusted.

gcc/testsuite
* gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to
encourage if-conversion.  Skip for -Os.
* gcc.target/mips/movcc-3.c: Similarly.

RISC-V: decouple stack allocation for rv32e w/o save-restore

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addi sp,sp,-16
sw ra,12(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,12(sp)
addi sp,sp,16
jr ra

after patch:
addi sp,sp,-8
sw ra,4(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,4(sp)
addi sp,sp,8
jr ra

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function
for riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_avoid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_stack.c: New test.

Daily bump.

testsuite: Handle empty assembly lines in check-function-bodies

I tried to make use of check-function-bodies for cris-elf and was a
bit surprised to see it failing.  There's a deliberate empty line
after the filled delay slot of the return-function which was
mishandled.  I thought "aha" and tried to add an empty line
(containing just a "**" prefix) to the match, but that didn't help.
While it was added as input from the function's assembly output
to-be-matched like any other line, it couldn't be matched: I had to
use "...", which works but is...distracting.

Some digging shows that an empty assembly line can't be deliberately
matched because all matcher lines (lines starting with the prefix,
the ubiquitous "**") are canonicalized by trimming leading
whitespace (the "string trim" in check-function-bodies) and instead
adding a leading TAB character, thus empty lines end up containing
just a TAB.  For usability it's better to treat empty lines as fluff
than to uglifying the test-case and the code to properly match them.
Double-checking, no test-case tries to match an line containing just
TAB (by providing an a line containing just "**\s*", i.e. zero or
more whitespace characters).

* lib/scanasm.exp (parse_function_bodies): Set fluff to include
empty lines (besides optionally leading whitespace).

Fix autoprofiledbootstrap build

1. Fix gcov version
2. Merge perf data collected when compiling the compiler and runtime libraries
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Define PROFILE_MERGER
* Makefile.tpl: Define PROFILE_MERGER

gcc/c/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling cc1 and runtime libraries

gcc/cp/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling cc1plus and runtime libraries

gcc/lto/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling lto1 and runtime libraries

gcc/ChangeLog:

* doc/install.texi: Fix documentation typo

RISC-V: Add divmod expansion support

Hi all,
If we have division and remainder calculations with the same operands:

  a = b / c;
  d = b % c;

We can replace the calculation of remainder with multiplication +
subtraction, using the result from the previous division:

  a = b / c;
  d = a * c;
  d = b - d;

Which will be faster.
Currently, it isn't done for RISC-V.

I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.

Best regards,
Matevos.

gcc/ChangeLog:

* config/riscv/iterators.md (only_div, paired_mod): New iterators.
(u): Add div/udiv cases.
* config/riscv/riscv-protos.h (riscv_use_divmod_expander): Prototype.
* config/riscv/riscv.cc (struct riscv_tune_param): Add field for
divmod expansion.
(rocket_tune_info, sifive_7_tune_info): Initialize new field.
(thead_c906_tune_info): Likewise.
(optimize_size_tune_info): Likewise.
(riscv_use_divmod_expander): New function.
* config/riscv/riscv.md (<u>divmod<mode>4): New expander.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/divmod-1.c: New testcase.
* gcc.target/riscv/divmod-2.c: New testcase.

RISC-V: Added support clmul[r,h] instructions for Zbc extension.

clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Added clmulr instruction.
* config/riscv/riscv-builtins.cc (AVAIL): Add new.
* config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
(type): Add clmul
* config/riscv/riscv-cmo.def: Added built-in function for clmulr.
* config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
* config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.
* config/riscv/generic.md: Add clmul to list of instructions
using the generic_imul reservation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: New test.
* gcc.target/riscv/zbc64.c: New test.

RISC-V: Eliminate redundant zero extension of minu/maxu operands

RV64 the following code:

  unsigned Min(unsigned a, unsigned b) {
      return a < b ? a : b;
  }

Compiles to:
  Min:
       zext.w  a1,a1
       zext.w  a0,a0
       minu    a0,a1,a0
       sext.w  a0,a0
       ret

This patch removes unnecessary zero extensions of minu/maxu operands.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Added expanders for minu/maxu instructions

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: Updated scanning check.
* gcc.target/riscv/zbb-min-max-03.c: New tests.

contrib: port doxygen script to Python3

contrib/ChangeLog:

* filter_gcc_for_doxygen: Use python3 and not python2.
* filter_params.py: Likewise.

PHIOPT: Move two_value_replacement to match.pd

This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.

MATCH: Add patterns from phiopt's minmax_replacement

This adds a few patterns from phiopt's minmax_replacement
for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> .
It is progress to remove minmax_replacement from phiopt.
There are still some more cases dealing with constants on the
edges (0/INT_MAX) to handle in match.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: Add patterns for
"(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>".

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
* gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
as that now does the combining.

MATCH: Factor out code that for min max detection with constants

This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.

Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.

PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR rtl-optimization/109476
* lower-subreg.cc: Include explow.h for force_reg.
(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
If decomposing a suitable LSHIFTRT and we're not splitting
ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
instead of setting a high part SUBREG to zero, which helps combine.
(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
PR rtl-optimization/109476
* gcc.target/avr/mmcu/pr109476.c: New test case.

Synchronize include/ctf.h with upstream binutils/libctf.

This patch updates include/ctf.h to match the current libctf version in
binutils' include/. I recently attempted to build a uber tree (following
some notes that are so old they used CVS) and noticed that binutils won't
build with gcc's top-level include, due to CTF_F_IDXSORTED not being
defined in ctf.h.

2023-04-28 Roger Sayle <roger@nextmovesoftware.com>

include/ChangeLog
* ctf.h: Import latest version from binutils/libctf.

Add emulated scatter capability to the vectorizer

This adds a scatter vectorization capability to the vectorizer
without target support by decomposing the offset and data vectors
and then performing scalar stores in the order of vector lanes.
This is aimed at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the scatter.

The offset load is still vectorized and costed as such, but like
with emulated gather those will be turned back to scalar loads
by forwrpop.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
consider scatters.
* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
gather-scatter info and cost emulated scatters accordingly.
(get_load_store_type): Support emulated scatters.
(vectorizable_store): Likewise. Emulate them by extracting
scalar offsets and data, doing scalar stores.

* gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
* gcc.dg/vect/vect-71.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.

Adjust costing of emulated vectorized gather/scatter

Emulated gather/scatter behave similar to strided elementwise
accesses in that they need to decompose the offset vector
and construct or decompose the data vector so handle them
the same way, pessimizing the cases with may elements.

For pr88531-2c.c instead of

.L4:
        leaq    (%r15,%rcx), %rdx
        incl    %edi
        movl    16(%rdx), %r13d
        movl    24(%rdx), %r14d
        movl    (%rdx), %r10d
        movl    4(%rdx), %r9d
        movl    8(%rdx), %ebx
        movl    12(%rdx), %r11d
        movl    20(%rdx), %r12d
        vmovss  (%rax,%r14,4), %xmm2
        movl    28(%rdx), %edx
        vmovss  (%rax,%r13,4), %xmm1
        vmovss  (%rax,%r10,4), %xmm0
        vinsertps       $0x10, (%rax,%rdx,4), %xmm2, %xmm2
        vinsertps       $0x10, (%rax,%r12,4), %xmm1, %xmm1
        vinsertps       $0x10, (%rax,%r9,4), %xmm0, %xmm0
        vmovlhps        %xmm2, %xmm1, %xmm1
        vmovss  (%rax,%rbx,4), %xmm2
        vinsertps       $0x10, (%rax,%r11,4), %xmm2, %xmm2
        vmovlhps        %xmm2, %xmm0, %xmm0
        vinsertf128     $0x1, %xmm1, %ymm0, %ymm0
        vmulps  %ymm3, %ymm0, %ymm0
        vmovups %ymm0, (%r8,%rcx)
        addq    $32, %rcx
        cmpl    %esi, %edi
        jb      .L4

we now prefer

.L4:
        leaq    0(%rbp,%rdx,8), %rcx
        movl    (%rcx), %r10d
        movl    4(%rcx), %ecx
        vmovss  (%rsi,%r10,4), %xmm0
        vinsertps       $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
        vmulps  %xmm1, %xmm0, %xmm0
        vmovlps %xmm0, (%rbx,%rdx,8)
        incq    %rdx
        cmpl    %edi, %edx
        jb      .L4

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Tame down element extracts and scalar loads for gather/scatter
similar to elementwise strided accesses.

* gcc.target/i386/pr89618-2.c: New testcase.
* gcc.target/i386/pr88531-2b.c: Adjust.
* gcc.target/i386/pr88531-2c.c: Likewise.

RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24                    <- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

* config/riscv/vector.md: Add new define split to perform
the simplification.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>

libstdc++: Improve doxygen docs for <random>

Add @headerfile and @since tags. Add gamma_distribution to the correct
group (poisson distributions). Add a group for the sampling
distributions and add the missing definitions of their probability
functions. Add uniform_int_distribution back to the uniform
distributions group.

libstdc++-v3/ChangeLog:

* include/bits/random.h (gamma_distribution): Add to the right
doxygen group.
(discrete_distribution, piecewise_constant_distribution)
(piecewise_linear_distribution): Create a new doxygen group and
fix the incomplete doxygen comments.
* include/bits/uniform_int_dist.h (uniform_int_distribution):
Add to doxygen group.

libstdc++: Minor fixes to doxygen comments

libstdc++-v3/ChangeLog:

* include/bits/uses_allocator.h: Add missing @file comment.
* include/bits/regex.tcc: Remove stray doxygen comments.
* include/experimental/memory_resource: Likewise.
* include/std/bit: Tweak doxygen @cond comments.
* include/std/expected: Likewise.
* include/std/numbers: Likewise.

libstdc++: Strip absolute paths from files shown in Doxygen docs

This avoids showing absolute paths from the expansion of
@srcdir@/libsupc++/ in the doxygen File List view.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes
from header paths.

libstdc++: Simplify preprocessor/namespace nesting in <bits/move.h>

There's no good reason to conditionally close and reopen namespace std
within an #if block. Just include the <type_traits> header at the top
instead.

libstdc++-v3/ChangeLog:

* include/bits/move.h: Simplify opening/closing namespace std.

ipa/109652 - ICE in modification phase of IPA SRA

There's another questionable IL transform by IPA SRA, replacing
foo (p_1(D)->x) with foo (VIEW_CONVERT <union type> (ISRA.PARM.1))
where ISRA.PARM.1 is a register.  Conversion of a register to
an aggregate type is questionable but not entirely unreasonable
and not within the set of IL I am rejecting when fixing PR109644.

The following lets this slip through in IPA SRA transform by
restricting re-gimplification to the case of register type
results.  To not break the previous testcase again we need to
optimize the BIT_FIELD_REF <VIEW_CONVERT <...>, ...> case
to elide the conversion.

PR ipa/109652
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Allow
conversion of a register to a non-register type.  Elide
conversions inside BIT_FIELD_REFs.

* gcc.dg/torture/pr109652.c: New testcase.

OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]

This patch fixes several cases where multiple attach or detach mapping
nodes were being created for stand-alone attach or detach clauses
in Fortran. After the introduction of stricter checking later during
compilation, these extra nodes could cause ICEs, as seen in the PR.

The patch also fixes cases that "happened to work" previously where
the user attaches/detaches a pointer to array using a descriptor, and
(I think!) the "_data" field has offset zero, hence the same address as
the descriptor as a whole.

2023-04-27 Julian Brown <julian@codesourcery.com>

PR fortran/109622

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Attach/detach clause fixes.

gcc/testsuite/
* gfortran.dg/goacc/attach-descriptor.f90: Adjust expected output.

libgomp/
* testsuite/libgomp.fortran/pr109622.f90: New test.
* testsuite/libgomp.fortran/pr109622-2.f90: New test.
* testsuite/libgomp.fortran/pr109622-3.f90: New test.

tree-optimization/109644 - missing IL checking

We fail to verify the constraints under which we allow handled
components to wrap registers.  The gcc.dg/pr70022.c testcase shows
that we happily end up with

  _2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))

as produced by SSA rewrite and update_address_taken.  But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.

The following makes IL verification stricter which catches the
problem.

PR tree-optimization/109644
* tree-cfg.cc (verify_types_in_gimple_reference): Check
register constraints on the outermost VIEW_CONVERT_EXPR
only.  Do not allow register or invariant bases on
multi-level or possibly variable index handled components.

Avoid more invalid GIMPLE with register bases

The Ada frontend, for example with gnat.dg/inline2_pkg.adb, tends
to create VIEW_CONVERT expressions with aggregate type even of
non-aggregate entities.  In this case for example

  return <retval> = (BIT_FIELD_REF <VIEW_CONVERT_EXPR<struct inline2_pkg__ieee_short_real>(number), 16, 16> & 32640) != 32640;

currently gimplification and SSA rewrite turn this into

  _1 = BIT_FIELD_REF <VIEW_CONVERT_EXPR<struct inline2_pkg__ieee_short_real>(number_2(D));

which is two operations on a register.  While as seen with PR109652
we might not want to completely rule out register to aggregate type
VIEW_CONVERTs we definitely do not want to stack multiple ops here.

The solution is to make sure the gimplifier puts a non-register as
the base object.  For the above this will add

  number.1 = number;

and use number.1 in the compound reference.  Code generation is
unchanged, FRE optimizes this to BIT_FIELD_REF <number_2(D), ...>.
I think BIT_FIELD_REF <VIEW_CONVERT (x), ...> could be always
rewritten into BIT_FIELD_REF <x, ...>, but that's a separate thing.

* gimplify.cc (gimplify_compound_lval): When there's a
non-register type produced by one of the handled component
operations make sure we get a non-register base.

tree-optimization/108752 - vectorize emulated vectors in lowered form

The following makes sure to emit operations lowered to bit operations
when vectorizing using emulated vectors. This avoids relying on
the vector lowering pass adhering to the exact same cost considerations
as the vectorizer.

PR tree-optimization/108752
* tree-vect-generic.cc (build_replicated_const): Rename
to build_replicated_int_cst and move to tree.{h,cc}.
(do_plus_minus): Adjust.
(do_negate): Likewise.
* tree-vect-stmts.cc (vectorizable_operation): Emit emulated
arithmetic vector operations in lowered form.
* tree.h (build_replicated_int_cst): Declare.
* tree.cc (build_replicated_int_cst): Moved from
tree-vect-generic.cc build_replicated_const.

libstdc++: Another attempt to ensure g++ 13+ compiled programs enforce gcc 13.2+ libstdc++.so.6 [PR108969]

GCC used to emit an instance of an empty ios_base::Init class in
every TU which included <iostream> to ensure it is std::cout etc.
is initialized, but thanks to Patrick work on some targets (which have
init_priority attribute support) it is now initialized only inside of
libstdc++.so.6/libstdc++.a.

This causes a problem if people do something that has never been supported,
try to run GCC 13 compiled C++ code against GCC 12 or earlier
libstdc++.so.6 - std::cout etc. are then never initialized because code
including <iostream> expects the library to initialize it and the library
expects code including <iostream> to do that.

The following patch is second attempt to make this work cheaply as the
earlier attempt of aliasing the std::cout etc. symbols with another symbol
version didn't work out due to copy relocation breaking the aliases appart.

The patch forces just a _ZSt21ios_base_library_initv undefined symbol
into all *.o files which include <iostream> and while there is no runtime
relocation against that, it seems to enforce the right version of
libstdc++.so.6.  /home/jakub/src/gcc/obj08i/usr/local/ is the install
directory of trunk patched with this patch, /home/jakub/src/gcc/obj06/
is builddir of trunk without this patch, system g++ is GCC 12.1.1.
$ cat /tmp/hw.C
#include <iostream>

int
main ()
{
  std::cout << "Hello, world!" << std::endl;
}
$ cd /home/jakub/src/gcc/obj08i/usr/local/bin
$ ./g++ -o /tmp/hw /tmp/hw.C
$ readelf -Wa /tmp/hw 2>/dev/null | grep initv
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZSt21ios_base_library_initv@GLIBCXX_3.4.32 (4)
    71: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZSt21ios_base_library_initv@GLIBCXX_3.4.32
$ /tmp/hw
/tmp/hw: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/hw)
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj08i/usr/local/lib64/ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/ /tmp/hw
/tmp/hw: /home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/hw)
$ g++ -o /tmp/hw /tmp/hw.C
$ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj08i/usr/local/lib64/ /tmp/hw
Hello, world!

On sparc-sun-solaris2.11 one I've actually checked a version which had
defined(_GLIBCXX_SYMVER_SUN) next to defined(_GLIBCXX_SYMVER_GNU), but
init_priority attribute doesn't seem to be supported there and so I couldn't
actually test how this works there.  Using gas and Sun ld, Rainer, does one
need to use gas + gld for init_priority or something else?

2023-04-28  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/108969
* config/abi/pre/gnu.ver (GLIBCXX_3.4.32): Export
_ZSt21ios_base_library_initv.
* testsuite/util/testsuite_abi.cc (check_version): Add GLIBCXX_3.4.32
symver and make it the latestp.
* src/c++98/ios_init.cc (ios_base_library_init): New alias.
* acinclude.m4 (libtool_VERSION): Change to 6:32:0.
* include/std/iostream: If init_priority attribute is supported
and _GLIBCXX_SYMVER_GNU, force undefined _ZSt21ios_base_library_initv
symbol into the object.
* configure: Regenerated.

aarch64: PR target/99195 annotate more integer unary patterns for vec-concat with zero

More of the straightforward cases to annotate plus tests, this time for simple integer unary ops.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_rbit<mode>): Rename to...
(aarch64_rbit<mode><vczle><vczbe>): ... This.
(neg<mode>2): Rename to...
(neg<mode>2<vczle><vczbe>): ... This.
(abs<mode>2): Rename to...
(abs<mode>2<vczle><vczbe>): ... This.
(aarch64_abs<mode>): Rename to...
(aarch64_abs<mode><vczle><vczbe>): ... This.
(one_cmpl<mode>2): Rename to...
(one_cmpl<mode>2<vczle><vczbe>): ... This.
(clrsb<mode>2): Rename to...
(clrsb<mode>2<vczle><vczbe>): ... This.
(clz<mode>2): Rename to...
(clz<mode>2<vczle><vczbe>): ... This.
(popcount<mode>2): Rename to...
(popcount<mode>2<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for unary integer ops.

Fortran: Fix (mostly) comment typos

Only other changes are fixing the variable name a(b)breviated_modproc_decl
and a few typos in gfortran.texi.

gcc/fortran/ChangeLog:

* gfortran.texi: Fix typos.
* decl.cc: Fix typos in comments and in a variable name.
* arith.cc: Fix comment typos.
* check.cc: Likewise.
* class.cc: Likewise.
* dependency.cc: Likewise.
* expr.cc: Likewise.
* frontend-passes.cc: Likewise.
* gfortran.h: Likewise.
* intrinsic.cc: Likewise.
* iresolve.cc: Likewise.
* match.cc: Likewise.
* module.cc: Likewise.
* primary.cc: Likewise.
* resolve.cc: Likewise.
* simplify.cc: Likewise.
* trans-array.cc: Likewise.
* trans-decl.cc: Likewise.
* trans-expr.cc: Likewise.
* trans-intrinsic.cc: Likewise.
* trans-openmp.cc: Likewise.
* trans-stmt.cc: Likewise.

gimple-range-op: Handle sqrt (basic bounds only)

The following patch adds sqrt support (but similarly to sincos, only
dumb basic ranges only).

Will improve this incrementally and sin/cos as well.

2023-04-28 Jakub Jelinek <jakub@redhat.com>

* gimple-range-op.cc (class cfn_sqrt): New type.
(op_cfn_sqrt): New variable.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_SQRT{,_FN}.

* gcc.dg/tree-ssa/range-sqrt.c: New test.
* gfortran.dg/ieee/ieee_6.f90: Make x volatile to avoid
ranger optimizing sqrt (-1) call away because it is only used in
test for whether it returns NaN.

Implement range-op entry for sin/cos

On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote:
> [I don't know why I keep poking at floats.  I must really like the pain.
>
> This is the range-op entry for sin/cos.  It is meant to serve as an
> example of what we can do for glibc math functions.  It is by no means
> exhaustive, just a stub to restrict the return range from sin/cos to
> [-1.0, 1.0] with appropriate smarts of NANs.
>
> As can be seen in the testcase, we see sin() as well as
> __builtin_sin() in the IL, and can resolve the resulting range
> accordingly.

Here is an updated version of the patch on top of the
Add targetm.libm_function_max_error
patch with all my comments incorporated into your patch (but still no
handling of sin/cos ranges shorter than 2*M_PI).

2023-04-28  Aldy Hernandez  <aldyh@redhat.com>
    Jakub Jelinek  <jakub@redhat.com>

* value-range.h (frange_nextafter): Declare.
* gimple-range-op.cc (class cfn_sincos): New.
(op_cfn_sin, op_cfn_cos): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_{SIN,COS}{,_FN}.

* gcc.dg/tree-ssa/range-sincos.c: New test.

Add targetm.libm_function_max_error

As has been discussed before, the following patch adds target hook
for math library function maximum errors measured in ulps.
The default is to return ~0U which is a magic maximum value which means
nothing is known about precision of the match function.

The first argument is unsigned int because enum combined_fn isn't available
everywhere where target hooks are included but is expected to be given
the enum combined_fn value, although it should be used solely to find out
which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather
than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for
which there is the machine_mode argument.
The last argument is a bool, if it is false, the function should return
maximum known error in ulps for a given function (taking -frounding-math
into account if enabled), with 0.5ulps being represented as 0.
If it is true, it is about whether the function can return values outside of
an intrinsic finite range for the function and by how many ulps.
E.g. sin/cos should return result in [-1.,1], if the function is expected
to never return values outside of that finite interval, the hook should
return 0.  Similarly for sqrt such range is [-0.,+Inf].

The patch implements it for glibc only so far, I hope other maintainers
can submit details for Solaris, musl, perhaps BSDs, etc.
For glibc I've gathered data from:
1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
   as latest published glibc data
2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
   as a few years old glibc data
3) using attached libc-ulps.sh script from glibc git
4) using attached ulp-tester.c (how to invoke in file comment; tested
   both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
   modes, plus on x86_64 float/double sin/cos using libmvec - see
   attached libmvec-wrapper.c as well)
5) using attached boundary-tester.c to test for whether sin/cos/sqrt return
   values outside of the intrinsic ranges for those functions (again,
   tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well;
   libmvec with non-default rounding modes is pretty much random number
   generator it seems)

The data is added to various hooks, the generic and generic glibc versions
being in targhooks.c so that the various targets can easily override it.
The intent is that the generic glibc version handles most of the stuff
and specific target arch overrides handle the outliers or special cases.
The patch has special case for x86_64 when __FAST_MATH__ is defined (as
one can use in that case either libm or libmvec and we don't know which
one will be used; so it uses maximum of what libm provides and libmvec),
rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps
on most other targets; MODE_COMPOSITE_P could be in theory handled in the
generic code too, but as we have rs6000-linux specific function, it can be
done just there), arc-linux (because DFmode sin has 7ulps there compared to
1ulps on other targets, both in default rounding mode and in others) and
or1k-linux (while DFmode sin has 1ulps there for default rounding mode,
for other rounding modes it has up to 7ulps).
Now, for -frounding-math I'm trying to add a few ulps more because I expect
it to be much less tested, except that for boundary_p I try to use
the numbers I got from the 5) tester.

2023-04-28  Jakub Jelinek  <jakub@redhat.com>

* target.def (libm_function_max_error): New target hook.
* doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
* doc/tm.texi: Regenerated.
* targhooks.h (default_libm_function_max_error,
glibc_linux_libm_function_max_error): Declare.
* targhooks.cc: Include case-cfn-macros.h.
(default_libm_function_max_error,
glibc_linux_libm_function_max_error): New functions.
* config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/linux-protos.h (linux_libm_function_max_error): Declare.
* config/linux.cc: Include target.h and targhooks.h.
(linux_libm_function_max_error): New function.
* config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
(arc_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
(ix86_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/rs6000-protos.h
(rs6000_linux_libm_function_max_error): Declare.
* config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
and case-cfn-macros.h.
(rs6000_linux_libm_function_max_error): New function.
* config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
(or1k_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.

testsuite/C++: suppress filename canonicalization in module tests

The pathname underneath gcm.cache/ is determined from the effective name
used for the main input file of a particular module. When modules are
built, no canonicalization occurs for the main input file. Hence the
module file wouldn't be found if a different (the canonicalized) file
name was used when importing that same module. (This is an effect of
importing happening in the preprocessor, just like #include handling.)

Since it doesn't look easy to make module generation use libcpp's
maybe_shorter_path() (in fact I'd consider this a layering violation,
while cloning the logic would - at least in principle - be prone to both
going out of sync), simply suppress system header path canonicalization
for the respective tests.

gcc/testsuite/

* g++.dg/modules/alias-1_b.C: Add -fno-canonical-system-headers.
* g++.dg/modules/alias-1_d.C: Likewise.
* g++.dg/modules/alias-1_e.C: Likewise.
* g++.dg/modules/alias-1_f.C: Likewise.
* g++.dg/modules/cpp-6_c.C: Likewise.
* g++.dg/modules/dir-only-2_b.C: Likewise.

testsuite/C++: cope with IPv6 being unavailable

When IPv6 is disabled in the kernel, the error message coming back from
Cody::OpenInet6() is different from the sole so far expected one.

gcc/testsuite/

* g++.dg/modules/bad-mapper-3.C: Relax failure pattern.

harden-conditionals: detach values before compares

The optimization barriers inserted after compares enable GCC to derive
information about the values from e.g. the taken paths, or the absence
of exceptions.  Move them before the original compares, so that the
reversed compares test copies of the original operands, without
further optimizations.

for  gcc/ChangeLog

* gimple-harden-conditionals.cc (insert_edge_check_and_trap):
Move detach value calls...
(pass_harden_conditional_branches::execute): ... here.
(pass_harden_compares::execute): Detach values before
compares.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-cond-comp.c: New.

Daily bump.

Update gcc .po files

* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po,
zh_TW.po: Update.

amdgcn: Fix addsub bug

The vec_fmsubadd instuction actually had add twice, by mistake.

Also improve code-gen for all the complex patterns by using properly
undefined values. Mostly this just prevents the compiler reserving space
in the stack frame.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (cmul<conj_op><mode>3): Use gcn_gen_undef.
(cml<addsub_as><mode>4): Likewise.
(vec_addsub<mode>3): Likewise.
(cadd<rot><mode>3): Likewise.
(vec_fmaddsub<mode>4): Likewise.
(vec_fmsubadd<mode>4): Likewise, and use sub for the odd lanes.

c++: print conversion error at candidate location

In testcases like this one, the printing of candidates in a diagnostic has
been longer than necessary because it jumps back and forth between the call
site and the candidate site. So here, we first say at the call site that no
match was found; then we note the candidate site, and then explain why it's
not suitable back at the call site, which means printing the call site line
with caret again. With this patch, the conversion diagnostic is at the same
location as the candidate, so we don't need to print any input line.

gcc/cp/ChangeLog:

* call.cc (print_conversion_rejection): Use iloc_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/template/copy1.C: Adjust error lines.

RISC-V: Add required tls to read thread pointer test

The read-thread-pointer test may require the gcc configured
with --enable-tls. If no, there x4 (aka tp) register will not
be presented in the assembly code.

This patch requires the tls for the dg checking. It will perform
the test checking if --enable-tls and mark the test as unsupported
if --disable-tls.

Configured with --enable-tls:
=== gcc Summary ===
of expected passes 16

Configured with --disable-tls:
=== gcc Summary ===
of unsupported tests 8

gcc/testsuite/ChangeLog:

* gcc.target/riscv/read-thread-pointer.c: Add required tls.

Signed-off-by: Pan Li <pan2.li@intel.com>

PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phiopt

In the early PHIOPT mode, the original minmax_replacement, would
replace a PHI node with up to 2 min/max expressions in some cases,
this allows for that too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
up to 2 min/max expressions in the sequence/match code.

MIN/MAX should be treated similar as comparisons for trapping

While looking into moving optimizations from minmax_replacement
in phiopt to match.pd, I Noticed that min/max were considered
trapping even if -ffinite-math-only was being used. This changes
those expressions to be similar as comparisons so that they are
not considered trapping if -ffinite-math-only is on.

OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.

gcc/ChangeLog:

* rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as
COMPARISON.
* tree-eh.cc (operation_could_trap_helper_p): Treate
MIN_EXPR/MAX_EXPR similar as other comparisons.

PHIOPT: Move store_elim_worker into pass_cselim::execute

This simple patch moves the body of store_elim_worker
direclty into pass_cselim::execute.

Also removes unneeded prototypes too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (cond_store_replacement): Remove
prototype.
(cond_if_else_store_replacement): Likewise.
(get_non_trapping): Likewise.
(store_elim_worker): Move into ...
(pass_cselim::execute): This.

PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::execute

Now that store elimination and phiopt does not
share outer code, we can move tree_ssa_phiopt_worker
directly into pass_phiopt::execute and remove
many declarations (prototypes) from the file.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (two_value_replacement): Remove
prototype.
(match_simplify_replacement): Likewise.
(factor_out_conditional_conversion): Likewise.
(value_replacement): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_removal_in_builtin_zero_pattern): Likewise.
(hoist_adjacent_loads): Likewise.
(tree_ssa_phiopt_worker): Move into ...
(pass_phiopt::execute): this.

PHIOPT: Split out store elimination from phiopt

Since the last cleanups, it made easier to see
that we should split out the store elimination
worker from tree_ssa_phiopt_worker function.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
do_store_elim argument and split that part out to ...
(store_elim_worker): This new function.
(pass_cselim::execute): Call store_elim_worker.
(pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.

MAINTAINERS: Change my email address.

I'm at Ventana now. Change my email address accordingly. Also, add
myself to the DCO list.

ChangeLog:

* MAINTAINERS: Change my email address.

Unloop loops that no longer loops in tree-ssa-loop-ch

I noticed this after adding sanity check that the upper bound on number
of iterations never drop to -1. It seems to be relatively common case
(happening few hundred times in testsuite and also during bootstrap)
that loop-ch duplicates enough so the loop itself no longer loops.

This is later detected in loop unrolling but since we test the number
of iterations anyway it seems better to do that earlier.

* cfgloopmanip.h (unloop_loops): Export.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Unloop loops
that no longer loop.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Export; do not free
vectors of loops to unloop.
(canonicalize_induction_variables): Free vectors here.
(tree_unroll_loops_completely): Free vectors here.

tree-optimization/109170 - bogus use-after-free with __builtin_expect

The following generalizes the range-op for __builtin_expect
by using the fnspec machinery.

PR tree-optimization/109170
* gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
Handle __builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.

Use CONFIG_SHELL-/bin/sh in genmultilib

There are still shells on some systems that lack the ability to start
scripts when not using the shell name explicitly. Adjust genmultilib
to use ${CONFIG_SHELL-/bin/sh} the same way configure does.

for gcc/ChangeLog

* genmultilib: Use CONFIG_SHELL to run sub-scripts.

Normalize addresses in IPA before calling range_op_handler [PR109639]

The old legacy code would allow building ranges containing symbolics,
even though the entire ranger ecosystem does not handle them.  These
were normalized into non-zero ranges by helper functions in VRP
(range_fold_*_expr) before calling the ranger.  The only users of
these functions should have been legacy VRP, which is no more.
However, a handful of users crept into IPA, even though these
functions shouldn't never been called outside of VRP or vr-values.

The issue here is that IPA is building a range of [&foo, &foo] and
expecting range_fold_binary to normalize it to non-zero.  Fixed by
adding a helper function before calling the range_op handler.

I think these covers the problematic ranges.  If not, I'll come up
with something more generalized that does not involve polluting
irange::set with the normalization code.  After all, this only
involves a handful of IPA places.

I've also added an assert in irange::set() making it easier to detect
any possible fallout without having to drill deep into the setter.

gcc/ChangeLog:

PR tree-optimization/109639
* ipa-cp.cc (ipa_value_range_from_jfunc): Normalize range.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.h (ipa_range_set_and_normalize): New.
* value-range.cc (irange::set): Assert min and max are INTEGER_CST.

wrong GIMPLE from (bit_field_ref CTOR ..) simplification

When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 }
and attempt to produce (view converted) { _1, _2 } for a selected
subset we fail to realize this cannot be done from match.pd since
we have no way to write the resulting CTOR "operation" and the
built CTOR { _1, _2 } isn't a GIMPLE value.

This kind of simplifications have to be done in forwprop (or would
need a match.pd syntax extension) where we can split out the CTOR
to a separate stmt.

The following disables this particular simplification when we are
simplifying GIMPLE. With enhanced IL checking this otherwise
causes ICEs in the testsuite from vectorized code.

* match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not
create a CTOR operand in the result when simplifying GIMPLE.

Properly gimplify handled component chains on registers

When for example complex lowering wants to extract the imaginary
part of a complex variable for lowering a complex move we can
end up with it generating __imag <VIEW_CONVERT_EXPR <_22> > which
is valid GENERIC. It then feeds that to the gimplifier via
force_gimple_operand but that fails to split up this chain
of handled components, generating invalid GIMPLE catched by
verification when PR109644 is fixed.

The following rectifies this by noting in gimplify_compound_lval
when the base object which we gimplify first ends up being a
register.

* gimplify.cc (gimplify_compound_lval): When the base
gimplified to a register make sure to split up chains
of operations.

ipa/109607 - properly gimplify conversions introduced by IPA param manipulation

The following addresses IPA param manipulation (through IPA SRA)
replacing

  BIT_FIELD_REF <*this_8(D), 8, 56>

with

  BIT_FIELD_REF <VIEW_CONVERT_EXPR<const struct profile_count>(ISRA.814), 8, 56>

which is supposed to be invalid GIMPLE (ISRA.814 is a register).
There's currently insufficient checking in place to catch this in the
IL verifier but I am working on that as part of fixing PR109594.

The solution for the particular testcase I am running into this is
to split the conversion to a separate stmt.  Generally the modification
phase is set up for this but the extra_stmts sequence isn't passed
around everywhere.  The following passes it to modify_expression
from modify_assignment when rewriting the RHS.

PR ipa/109607
* ipa-param-manipulation.h
(ipa_param_body_adjustments::modify_expression): Add extra_stmts
argument.
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Likewise.
When we need a conversion and the replacement is a register
split the conversion out.
(ipa_param_body_adjustments::modify_assignment): Pass
extra_stmts to RHS modify_expression.

* g++.dg/torture/pr109607.C: New testcase.

libstdc++: Fix typos in doxygen comments

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h: Fix typo in doxygen comment.
* include/std/format: Likewise.

libstdc++: Remove obsolete options from Doxygen config

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (FORMULA_TRANSPARENT, DOT_FONTNAME)
(DOT_FONTSIZE, DOT_TRANSPARENT): Remove obsolete options.

libstdc++: Reduce Doxygen output for PDF

Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.

There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.

libstdc++: Improve doxygen docs for <memory_resource>

libstdc++-v3/ChangeLog:

* include/bits/memory_resource.h: Improve doxygen comments.
* include/std/memory_resource: Likewise.

libstdc++: Add @headerfile and @since to doxygen comments [PR40380]

libstdc++-v3/ChangeLog:

PR libstdc++/40380
* include/bits/basic_string.h: Improve doxygen comments.
* include/bits/cow_string.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/fs_dir.h: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/quoted_string.h: Likewise.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/std/filesystem: Likewise.
* include/std/iomanip: Likewise.

libstdc++: Make std::random_device throw std::system_error [PR105081]

This changes std::random_device constructors to throw std::system_error
(with EINVAL as the error code) when the constructor argument is
invalid. We can also throw std::system_error when read(2) fails so that
the exception includes the additional information provided by errno.

As noted in the PR, this is consistent with libc++, and doesn't break
any existing code which catches std::runtime_error, because those
handlers will still catch std::system_error.

libstdc++-v3/ChangeLog:

PR libstdc++/105081
* src/c++11/random.cc (__throw_syserr): New function.
(random_device::_M_init, random_device::_M_init_pretr1): Use new
function for bad tokens.
(random_device::_M_getval): Use new function for read errors.
* testsuite/util/testsuite_random.h (random_device_available):
Change catch handler to use std::system_error.

c: Fix up error-recovery on non-empty VLA initializers [PR109409]

On the following testcase we ICE, because after we emit the
variable-sized object may not be initialized except with an empty initializer
error we don't really reset the initializer to error_mark_node and then at
-Wformat checking time we ICE on seeing STRING_CST initializer for a VLA.

The following patch just arranges for error_mark_node to be returned after
the error diagnostics.

2023-04-27 Jakub Jelinek <jakub@redhat.com>

PR c/109409
* c-parser.cc (c_parser_initializer): Move diagnostics about
initialization of variable sized object with non-empty initializer
after c_parser_expr_no_commas call and ret.set_error (); after it.

* gcc.dg/pr109409.c: New test.

c: Fix up error-recovery on functions initialized as variables [PR109412]

The change to allow empty initializers in C broke error-recovery on the
following testcase.  We are emitting function %qD is initialized like a
variable error early; if the initializer is non-empty, we just emit
another error that the initializer is invalid.  Previously if it was empty,
we'd emit another error that scalar is being initialized by empty
initializer (not really correct), but now we instead just try to
build_zero_cst for the FUNCTION_TYPE and ICE on it.

The following patch just emits the same diagnostics for the empty
initializers as we emit for the non-empty ones.

2023-04-27  Jakub Jelinek  <jakub@redhat.com>

PR c/107682
PR c/109412
* c-typeck.cc (pop_init_level): If constructor_type is FUNCTION_TYPE,
reject empty initializer as invalid.

* gcc.dg/pr109412.c: New test.

doc: Add explanation of zero-length array example

gcc/ChangeLog:

* doc/extend.texi (Zero Length): Describe example.

tree-optimization/109594 - wrong register promotion

We fail to verify the constraints under which we allow handled
components to wrap registers.  The gcc.dg/pr70022.c testcase shows
that we happily end up with

  _2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))

as produced by SSA rewrite and update_address_taken.  But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.

Together with the improved gimple_load predicate taking advantage
of the above and ASAN this eventually ICEd.

The following fixes update_address_taken as to this constraint.

PR tree-optimization/109594
* tree-ssa.cc (non_rewritable_mem_ref_base): Constrain
what we rewrite to a register based on the above.

testsuite: adjust NOP expectations for RISC-V

RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.

gcc/testsuite/

* c-c++-common/patchable_function_entry-decl.c: Special-case
RISC-V.
* c-c++-common/patchable_function_entry-default.c: Likewise.
* c-c++-common/patchable_function_entry-definition.c: Likewise.

c++: restore instantiate_decl assert

For PR61445 I removed this assert, but PR108242 demonstrated why it's still
useful; to avoid regressing the former testcase I check pattern_defined
in the assert.

This reverts r212524.

PR c++/61445

gcc/cp/ChangeLog:

* pt.cc (instantiate_decl): Assert !defer_ok for local
class members.

Daily bump.

libgcc CRIS: Define TARGET_HAS_NO_HW_DIVIDE

With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in
the "fast" case or from 290 to 200 cycles in the "slow" case (when the
!TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions
for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c
compiled for -march=v10.

Unfortunately, it just puts a performance improvement "dent" of 0.07%
in a arith-rand-ll.c-based performance test - where all loops are also
reduced to 1/10.

The size of every affected libgcc function is reduced to less than
half and they are all now leaf functions.

* config/cris/t-cris (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.

RISC-V: Fix sync.md and riscv.cc whitespace errors

This patch fixes whitespace errors introduced with
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html

2023-04-26 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv.cc: Fix whitespace.
* config/riscv/sync.md: Fix whitespace.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

c++: remove nsdmi_inst hashtable

It occurred to me that we have a perfectly good DECL_INITIAL field to put
the instantiated DMI into, we don't need a separate hash table.

gcc/cp/ChangeLog:

* init.cc (nsdmi_inst): Remove.
(maybe_instantiate_nsdmi_init): Use DECL_INITIAL instead.

c++: local class in nested generic lambda [PR109241]

The earlier fix for PR109241 avoided the crash by handling a type with no
TREE_BINFO. But we want to move toward doing the partial substitution of
classes in generic lambdas, so let's take a step in that direction.

PR c++/109241

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Do partially instantiate.
(tsubst_expr): Do call complete_type for partial instantiations.

c++: unique friend shenanigans [PR69836]

Normally we re-instantiate a function declaration when we start to
instantiate the body in case of multiple declarations.  In this wacky
testcase, this causes a problem because the type of the w_counter parameter
depends on its declaration not being in scope yet, so the name lookup only
finds the previous declaration.  This isn't a problem for member functions,
since they aren't subject to argument-dependent lookup.  So let's just skip
the regeneration for hidden friends.

PR c++/69836

gcc/cp/ChangeLog:

* pt.cc (regenerate_decl_from_template): Skip unique friends.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend76.C: New test.

c++: micro-optimize most_specialized_partial_spec

This introduces an early exit test to most_specialized_partial_spec for
templates which have no partial specializations, saving some unnecessary
work during class template instantiation in the common case.  In passing,
modernize the code a bit.

gcc/cp/ChangeLog:

* pt.cc (most_specialized_partial_spec): Exit early when
DECL_TEMPLATE_SPECIALIZATIONS is empty.  Move local variable
declarations closer to their first use.  Remove redundant
flag_concepts test.  Remove redundant forward declaration.

Create a lazy ssa_cache.

Sparsely used ssa caches can benefit from using a bitmap to
determine if a name already has an entry. Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.

PR tree-optimization/108697
* gimple-range-cache.cc (ssa_global_cache::clear_range): Do
not clear the vector on an out of range query.
(ssa_cache::dump): Use dump_range_query instead of get_range.
(ssa_cache::dump_range_query): New.
(ssa_lazy_cache::dump_range_query): New.
(ssa_lazy_cache::set_range): New.
* gimple-range-cache.h (ssa_cache::dump_range_query): New.
(class ssa_lazy_cache): New.
(ssa_lazy_cache::ssa_lazy_cache): New.
(ssa_lazy_cache::~ssa_lazy_cache): New.
(ssa_lazy_cache::get_range): New.
(ssa_lazy_cache::clear_range): New.
(ssa_lazy_cache::clear): New.
(ssa_lazy_cache::dump): New.
* gimple-range-path.cc (path_range_query::path_range_query): Do
not allocate a ssa_cache object nor has_cache bitmap.
(path_range_query::~path_range_query): Do not free objects.
(path_range_query::clear_cache): Remove.
(path_range_query::get_cache): Adjust.
(path_range_query::set_cache): Remove.
(path_range_query::dump): Don't call through a pointer.
(path_range_query::internal_range_of_expr): Set cache directly.
(path_range_query::reset_path): Clear cache directly.
(path_range_query::ssa_range_in_phi): Fold with globals only.
(path_range_query::compute_ranges_in_phis): Simply set range.
(path_range_query::compute_ranges_in_block): Call cache directly.
* gimple-range-path.h (class path_range_query): Replace bitmap
and cache pointer with lazy cache object.
* gimple-range.h (class assume_query): Use ssa_lazy_cache.

Rename ssa_global_cache to ssa_cache and add has_range

This renames the ssa_global_cache to be ssa_cache.  The original use was
to function as a global cache, but its uses have expanded.  Remove all mention
of "global" from the class and methods.  Also add a has_range method.

* gimple-range-cache.cc (ssa_cache::ssa_cache): Rename.
(ssa_cache::~ssa_cache): Rename.
(ssa_cache::has_range): New.
(ssa_cache::get_range): Rename.
(ssa_cache::set_range): Rename.
(ssa_cache::clear_range): Rename.
(ssa_cache::clear): Rename.
(ssa_cache::dump): Rename and use get_range.
(ranger_cache::get_global_range): Use get_range and set_range.
(ranger_cache::range_of_def): Use get_range.
* gimple-range-cache.h (class ssa_cache): Rename class and methods.
(class ranger_cache): Use ssa_cache.
* gimple-range-path.cc (path_range_query::path_range_query): Use
ssa_cache.
(path_range_query::get_cache): Use get_range.
(path_range_query::set_cache): Use set_range.
* gimple-range-path.h (class path_range_query): Use ssa_cache.
* gimple-range.cc (assume_query::assume_range_p): Use get_range.
(assume_query::range_of_expr): Use get_range.
(assume_query::assume_query): Use set_range.
(assume_query::calculate_op): Use get_range and set_range.
* gimple-range.h (class assume_query): Use ssa_cache.

Add sbr_lazy_vector and adjust (e)vrp sparse cache

Add a sparse vector class for cache and use if by default.
Rename the evrp_* params to vrp_*, and add a param for small CFGS which use
just the original basic vector.

* gimple-range-cache.cc (sbr_vector::sbr_vector): Add parameter
and local to optionally zero memory.
(br_vector::grow): Only zero memory if flag is set.
(class sbr_lazy_vector): New.
(sbr_lazy_vector::sbr_lazy_vector): New.
(sbr_lazy_vector::set_bb_range): New.
(sbr_lazy_vector::get_bb_range): New.
(sbr_lazy_vector::bb_range_p): New.
(block_range_cache::set_bb_range): Check flags and Use sbr_lazy_vector.
* gimple-range-gori.cc (gori_map::calculate_gori): Use
param_vrp_switch_limit.
(gori_compute::gori_compute): Use param_vrp_switch_limit.
* params.opt (vrp_sparse_threshold): Rename from evrp_sparse_threshold.
(vrp_switch_limit): Rename from evrp_switch_limit.
(vrp_vector_threshold): New.

Quicker relation check.

If either of the SSA names in a comparison do not have any equivalences
or relations, we can short-circuit the check slightly.

* value-relation.cc (dom_oracle::query_relation): Check early for lack
of any relation.
* value-relation.h (equiv_oracle::has_equiv_p): New.

Don't save ssa-name pointer in dependency cache.

If the direct dependence fields point directly to an ssa-name,
its possible that an optimization frees an ssa-name, and the value
pointed to may now be in the free list. Simply maintain the ssa
version number instead.

PR tree-optimization/109417
* gimple-range-gori.cc (range_def_chain::register_dependency):
Save the ssa version number, not the pointer.
(gori_compute::may_recompute_p): No need to check if a dependency
is in the free list.
* gimple-range-gori.h (class range_def_chain): Change ssa1 and ssa2
fields to be unsigned int instead of trees.
(ange_def_chain::depend1): Adjust.
(ange_def_chain::depend2): Adjust.
* gimple-range.h: Include "ssa.h" to inline ssa_name().

aix: Default AIX 7.2 to POWER7 server and AIX 7.3 to POWER8 server.

AIX 7.2 minimum ISA is POWER7 and AIX 7.3 minimum ISA is POWER8.
This patch changes the aix72.h configuration to POWER7 with VSX enabled
by default (with the AIX VSX ABI limitations), matching LLVM on AIX,
and changes the aix73.h configuration to POWER8.

gcc/ChangeLog:
* config/rs6000/aix72.h (TARGET_DEFAULT): Use ISA_2_6_MASKS_SERVER.
* config/rs6000/aix73.h (TARGET_DEFAULT): Use ISA_2_7_MASKS_SERVER.
(PROCESSOR_DEFAULT): Use PROCESSOR_POWER8.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

RISCV: Inline subword atomic ops

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.

libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.

gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

MAINTAINERS: Add myself to write after approval

2023-04-26 Patrick O'Neill <patrick@rivosinc.com>

* MAINTAINERS: Add myself.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

aarch64: Reimplement RSHRN2 intrinsic patterns with standard RTL codes

Similar to the previous patch, we can reimplement the rshrn2 patterns using standard RTL codes
for shift, truncate and plus with the appropriate constants.
This allows us to get rid of UNSPEC_RSHRN entirely.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_rshrn2<mode>_insn_le):
Reimplement using standard RTL codes instead of unspec.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_RSHRN): Delete.

aarch64: Reimplement RSHRN intrinsic patterns with standard RTL codes

This patch reimplements the backend patterns for the rshrn intrinsics using standard RTL codes rather than UNSPECS.
We already represent shrn as truncate of a shift. rshrn can be represented as truncate (src + (1 << (shft - 1)) >> shft),
similar to how LLVM treats it.

I have a follow-up patch to do the same for the rshrn2 pattern, which will allow us to remove the UNSPEC_RSHRN entirely.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>_insn_le): Reimplement
with standard RTL codes instead of an UNSPEC.
(aarch64_rshrn<mode>_insn_be): Likewise.
(aarch64_rshrn<mode>): Adjust for the above.
* config/aarch64/predicates.md (aarch64_simd_rshrn_imm_vec): Define.

libsanitizer: change LOCAL_PATCHES revision

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Change revision.

libsanitizer: Apply local patches

libsanitizer: merge from upstream (3185e47b5a8444e9fd).

RISC-V: Legitimise the const0_rtx for RVV load/store address

This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV load/store instructions.

For example:
vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
{
  return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
}

Before this patch:
li      a5,0
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(a5)  <- can propagate the const 0 to a5 here
vs1r.v  v24,0(a0)

After this patch:
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(zero)
vs1r.v  v24,0(a0)

As above, this patch allow you to propagate the const 0 (aka zero
register) to the base register of the RVV Unit-Stride load in the
combine pass. This may benefit the underlying RVV auto-vectorization.

However, the indexed load failed to perform the optimization and it
will be take care of in another PATCH.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_classify_address): Allow
const0_rtx for the RVV load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Remove legacy range support.

This patch removes all the code paths guarded by legacy_mode_p(), thus
allowing us to re-use the int_range<1> idiom for a range of one
sub-range. This allows us to represent these simple ranges in a more
efficient manner.

gcc/ChangeLog:

* range-op.cc (range_op_cast_tests): Remove legacy support.
* value-range-storage.h (vrange_allocator::alloc_irange): Same.
* value-range.cc (irange::operator=): Same.
(get_legacy_range): Same.
(irange::copy_legacy_to_multi_range): Delete.
(irange::copy_to_legacy): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Remove legacy support.
(irange::verify_range): Same.
(irange::legacy_lower_bound): Delete.
(irange::legacy_upper_bound): Delete.
(irange::legacy_equal_p): Delete.
(irange::operator==): Remove legacy support.
(irange::singleton_p): Same.
(irange::value_inside_range): Same.
(irange::contains_p): Same.
(intersect_ranges): Delete.
(irange::legacy_intersect): Delete.
(union_ranges): Delete.
(irange::legacy_union): Delete.
(irange::legacy_verbose_union_): Delete.
(irange::legacy_verbose_intersect): Delete.
(irange::irange_union): Remove legacy support.
(irange::irange_intersect): Same.
(irange::intersect): Same.
(irange::invert): Same.
(ranges_from_anti_range): Delete.
(gt_pch_nx): Adjust for legacy removal.
(gt_ggc_mx): Same.
(range_tests_legacy): Delete.
(range_tests_misc): Adjust for legacy removal.
(range_tests): Same.
* value-range.h (class irange): Same.
(irange::legacy_mode_p): Delete.
(ranges_from_anti_range): Delete.
(irange::nonzero_p): Adjust for legacy removal.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
* vr-values.cc (simplify_using_ranges::legacy_fold_cond_overflow): Same.

Remove range_has_numeric_bounds_p.

gcc/ChangeLog:

* value-range.cc (irange::copy_legacy_to_multi_range): Rewrite use
of range_has_numeric_bounds_p with irange API.
(range_has_numeric_bounds_p): Delete.
* value-range.h (range_has_numeric_bounds_p): Delete.

Remove range_int_cst_p.

gcc/ChangeLog:

* tree-data-ref.cc (compute_distributive_range): Replace uses of
range_int_cst_p with irange API.
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Same.
* tree-vrp.h (range_int_cst_p): Delete.
* vr-values.cc (check_for_binary_op_overflow): Replace usees of
range_int_cst_p with irange API.
(vr_set_zero_nonzero_bits): Same.
(range_fits_type_p): Same.
(simplify_using_ranges::simplify_casted_cond): Same.
* tree-vrp.cc (range_int_cst_p): Remove.

Convert compare_nonzero_chars to wide_ints.

gcc/ChangeLog:

* tree-ssa-strlen.cc (compare_nonzero_chars): Convert to wide_ints.

Remove some uses of deprecated irange API.

gcc/ChangeLog:

* builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange
API uses to new API.
* gimple-predicate-analysis.cc (find_var_cmp_const): Same.
* internal-fn.cc (get_min_precision): Same.
* match.pd: Same.
* tree-affine.cc (expr_to_aff_combination): Same.
* tree-data-ref.cc (dr_step_indicator): Same.
* tree-dfa.cc (get_ref_base_and_extent): Same.
* tree-scalar-evolution.cc (iv_can_overflow_p): Same.
* tree-ssa-phiopt.cc (two_value_replacement): Same.
* tree-ssa-pre.cc (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.cc (compare_nonzero_chars): Same.
* tree-switch-conversion.cc (bit_test_cluster::emit): Same.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Same.
* tree.cc (get_range_pos_neg): Same.

Replace ad-hoc value_range dumpers with irange::dump.

This causes a regression in gcc.c-torture/unsorted/dump-noaddr.c.

The test is asserting that two dumps are identical, but they are not
because irange dumps the type which varies between runs:

< VR [irange] void (*<T3dc>) (int) [1, +INF]
> VR [irange] void (*<T3da>) (int) [1, +INF]

I have changed the pretty printer for irange types to pass TDF_NOUID,
thus avoiding this problem.

gcc/ChangeLog:

* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Use
vrange::dump instead of ad-hoc dumper.
* tree-ssa-strlen.cc (dump_strlen_info): Same.
* value-range-pretty-print.cc (visit): Pass TDF_NOUID to
dump_generic_node.

Fix swapping of ranges.

The legacy range code has logic to swap out of order endpoints in the
irange constructor.  The new irange code expects the caller to fix any
inconsistencies, thus speeding up the common case.  However, this means
that when we remove legacy, any stragglers must be fixed.  This patch
fixes the 3 culprits found during the conversion.

gcc/ChangeLog:

* range-op.cc (operator_cast::op1_range): Use
create_possibly_reversed_range.
(operator_bitwise_and::simple_op1_range_solver): Same.
* value-range.cc (swap_out_of_order_endpoints): Delete.
(irange::set): Remove call to swap_out_of_order_endpoints.

Convert users of legacy API to get_legacy_range() function.

This patch converts the users of the legacy API to a function called
get_legacy_range() which will return the pieces of the soon to be
removed API (min, max, and kind).  This is a temporary measure while
these users are converted.

In upcoming patches I will convert most users, but most of the
middle-end warning uses will remain.  Naive attempts to remove them
showed that a lot of these uses are quite dependant on the anti-range
idiom, and converting them to the new API broke the tests, even when
the conversion was conceptually correct.  Perhaps someone who
understands these passes could take a stab at it.  In the meantime,
the legacy uses can be trivially found by grepping for
get_legacy_range.

gcc/ChangeLog:

* builtins.cc (determine_block_size): Convert use of legacy API to
get_legacy_range.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Same.
(array_bounds_checker::check_array_ref): Same.
* gimple-ssa-warn-restrict.cc
(builtin_memref::extend_offset_range): Same.
* ipa-cp.cc (ipcp_store_vr_results): Same.
* ipa-fnsummary.cc (set_switch_stmt_execution_predicate): Same.
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same.
(ipa_write_jump_function): Same.
* pointer-query.cc (get_size_range): Same.
* tree-data-ref.cc (split_constant_offset): Same.
* tree-ssa-strlen.cc (get_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(strlen_pass::get_len_or_size): Same.
(strlen_pass::count_nonzero_bytes_addr): Same.
* tree-vect-patterns.cc (vect_get_range_info): Same.
* value-range.cc (irange::maybe_anti_range): Remove.
(get_legacy_range): New.
(irange::copy_to_legacy): Use get_legacy_range.
(ranges_from_anti_range): Same.
* value-range.h (class irange): Remove maybe_anti_range.
(get_legacy_range): New.
* vr-values.cc (check_for_binary_op_overflow): Convert use of
legacy API to get_legacy_range.
(compare_ranges): Same.
(compare_range_with_value): Same.
(bounds_of_var_in_loop): Same.
(find_case_label_ranges): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.

Remove irange::constant_p.

gcc/ChangeLog:

* value-range-pretty-print.cc (vrange_printer::visit): Remove
constant_p use.
* value-range.cc (irange::constant_p): Remove.
(irange::get_nonzero_bits_from_range): Remove constant_p use.
* value-range.h (class irange): Remove constant_p.
(irange::num_pairs): Remove constant_p use.