git.ipfire.org Git - thirdparty/gcc.git/log

Testsuite, Darwin: actually skip test

Previous commit xfailed instead of skipping, but we really
want to skip.

gcc/testsuite/ChangeLog:

* gcc.target/i386/libcall-1.c: Skip on darwin.

RISC-V: Support highest overlap for wv instructions

According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap.

Before this patch:

        nop
        vsetivli        zero,4,e8,m4,tu,ma
        vle16.v v8,0(a0)
        vmv8r.v v0,v8
        vwsub.wv        v0,v8,v12
        nop
        addi    a4,a0,100
        vle16.v v8,0(a4)
        vmv8r.v v24,v8
        vwsub.wv        v24,v8,v12
        nop
        addi    a4,a0,200
        vle16.v v8,0(a4)
        vmv8r.v v16,v8
        vwsub.wv        v16,v8,v12
        nop

After this patch:

nop
vsetivli zero,4,e8,m4,tu,ma
vle16.v v0,0(a0)
vwsub.wv v0,v0,v4
nop
addi a4,a0,100
vle16.v v24,0(a4)
vwsub.wv v24,v24,v28
nop
addi a4,a0,200
vle16.v v16,0(a4)
vwsub.wv v16,v16,v20

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highest overlap for wv instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-39.c: New test.
* gcc.target/riscv/rvv/base/pr112431-40.c: New test.
* gcc.target/riscv/rvv/base/pr112431-41.c: New test.

RISC-V: Fix ICE in extract_single_source

This patch fixes the following ICE in VSETVL PASS:
bug.c:39:1: internal compiler error: Segmentation fault
   39 | }
      | ^
0x1ad5a08 crash_signal
        ../../../../gcc/gcc/toplev.cc:316
0x7f7f55feb90f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x218d7c7 extract_single_source
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:583
0x218d95d extract_single_source
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:604
0x218fbc5 pre_vsetvl::compute_lcm_local_properties()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2703
0x2190ef4 pre_vsetvl::earliest_fuse_vsetvl_info()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2890
0x2193e62 pass_vsetvl::lazy_vsetvl()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3537
0x219406a pass_vsetvl::execute(function*)
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3584

The rootcause we have a case that the def info can not be traced:

(insn 208 327 333 27 (use (reg/i:DI 10 a0)) "bug.c":36:1 -1
     (nil))

It's obvious, we conservatively disable any optimization in this situation if
AVL def_info can not be tracded.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (extract_single_source): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c: New test.

extend.texi: Mark builtin arguments with @var{...}

In many cases we just specify types for the builtin arguments, in other cases
types and names with @var{name} syntax, and in other case with just name.

Shall we tweak that somehow?  If the argument names are unimportant, perhaps
it is fine to leave that out, but shouldn't we always use @var{...} around
the parameter names when specified?

On Fri, Dec 01, 2023 at 10:43:57AM -0700, Sandra Loosemore wrote:
> Yup.  The Texinfo manual says:  "When using @deftypefn command and
> variations, you should mark parameter names with @var to distinguish these
> from data type names, keywords, and other parts of the literal syntax of the
> programming language."

Here is a patch which does that (but not adding types to where they were
missing, that will be harder to search for).

2023-12-11  Jakub Jelinek  <jakub@redhat.com>

* doc/extend.texi (__sync_fetch_and_add, __sync_fetch_and_sub,
__sync_fetch_and_or, __sync_fetch_and_and, __sync_fetch_and_xor,
__sync_fetch_and_nand, __sync_add_and_fetch, __sync_sub_and_fetch,
__sync_or_and_fetch, __sync_and_and_fetch, __sync_xor_and_fetch,
__sync_nand_and_fetch, __sync_bool_compare_and_swap,
__sync_val_compare_and_swap, __sync_lock_test_and_set,
__sync_lock_release, __atomic_load_n, __atomic_load, __atomic_store_n,
__atomic_store, __atomic_exchange_n, __atomic_exchange,
__atomic_compare_exchange_n, __atomic_compare_exchange,
__atomic_add_fetch, __atomic_sub_fetch, __atomic_and_fetch,
__atomic_xor_fetch, __atomic_or_fetch, __atomic_nand_fetch,
__atomic_fetch_add, __atomic_fetch_sub, __atomic_fetch_and,
__atomic_fetch_xor, __atomic_fetch_or, __atomic_fetch_nand,
__atomic_test_and_set, __atomic_clear, __atomic_thread_fence,
__atomic_signal_fence, __atomic_always_lock_free,
__atomic_is_lock_free, __builtin_add_overflow,
__builtin_sadd_overflow, __builtin_saddl_overflow,
__builtin_saddll_overflow, __builtin_uadd_overflow,
__builtin_uaddl_overflow, __builtin_uaddll_overflow,
__builtin_sub_overflow, __builtin_ssub_overflow,
__builtin_ssubl_overflow, __builtin_ssubll_overflow,
__builtin_usub_overflow, __builtin_usubl_overflow,
__builtin_usubll_overflow, __builtin_mul_overflow,
__builtin_smul_overflow, __builtin_smull_overflow,
__builtin_smulll_overflow, __builtin_umul_overflow,
__builtin_umull_overflow, __builtin_umulll_overflow,
__builtin_add_overflow_p, __builtin_sub_overflow_p,
__builtin_mul_overflow_p, __builtin_addc, __builtin_addcl,
__builtin_addcll, __builtin_subc, __builtin_subcl, __builtin_subcll,
__builtin_alloca, __builtin_alloca_with_align,
__builtin_alloca_with_align_and_max, __builtin_speculation_safe_value,
__builtin_nan, __builtin_nand32, __builtin_nand64, __builtin_nand128,
__builtin_nanf, __builtin_nanl, __builtin_nanf@var{n},
__builtin_nanf@var{n}x, __builtin_nans, __builtin_nansd32,
__builtin_nansd64, __builtin_nansd128, __builtin_nansf,
__builtin_nansl, __builtin_nansf@var{n}, __builtin_nansf@var{n}x,
__builtin_ffs, __builtin_clz, __builtin_ctz, __builtin_clrsb,
__builtin_popcount, __builtin_parity, __builtin_bswap16,
__builtin_bswap32, __builtin_bswap64, __builtin_bswap128,
__builtin_extend_pointer, __builtin_goacc_parlevel_id,
__builtin_goacc_parlevel_size, vec_clrl, vec_clrr, vec_mulh, vec_mul,
vec_div, vec_dive, vec_mod, __builtin_rx_mvtc): Use @var{...} around
parameter names.
(vec_rl, vec_sl, vec_sr, vec_sra): Likewise.  Use @var{...} also
around A, B and R in description.

RISC-V: Remove poly selftest when --preference=fixed-vlmax

This patch fixes multiple ICEs in full coverage testing:

cc1: internal compiler error: in riscv_legitimize_poly_move, at config/riscv/riscv.cc:2456^M
0x1fd8d78 riscv_legitimize_poly_move^M
        ../../../../gcc/gcc/config/riscv/riscv.cc:2456^M
0x1fd9518 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/config/riscv/riscv.cc:2583^M
0x2936820 gen_movdi(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/config/riscv/riscv.md:2099^M
0x11a0f28 rtx_insn* insn_gen_fn::operator()<rtx_def*, rtx_def*>(rtx_def*, rtx_def*) const^M
        ../../../../gcc/gcc/recog.h:431^M
0x13cf2f9 emit_move_insn_1(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/expr.cc:4553^M
0x13d010c emit_move_insn(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/expr.cc:4723^M
0x216f5e0 run_poly_int_selftest^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:185^M
0x21701e6 run_poly_int_selftests^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:226^M
0x2172109 selftest::riscv_run_selftests()^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:371^M
0x3b8067b selftest::run_tests()^M
        ../../../../gcc/gcc/selftest-run-tests.cc:112^M
0x1ad90ee toplev::run_self_tests()^M
        ../../../../gcc/gcc/toplev.cc:2209^M

Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

The rootcause is that we are testing POLY value computation during FIXED-VLMAX and ICE in this code:

  if (BYTES_PER_RISCV_VECTOR.is_constant ())
    {
      gcc_assert (value.is_constant ());                           ----->  assert failed.
      riscv_emit_move (dest, GEN_INT (value.to_constant ()));
      return;
    }

For example, a poly value [15, 16] is computed by csrr vlen + multiple scalar integer instructions.

However, such compile-time unknown value need to be computed when it is scalable vector, that is !BYTES_PER_RISCV_VECTOR.is_constant (),
since csrr vlenb = [16, 0] when -march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax and we have no chance to compute compile-time POLY value.

Also, we never reach the situation to compute a compile time unknown value when it is FIXED-VLMAX vector. So disable POLY selftest for FIXED-VLMAX.

gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (riscv_run_selftests):
Remove poly self test when FIXED-VLMAX.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/poly-selftest-1.c: New test.

[PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by RISC-V Zicond like insns

Take the following case for example.

CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2

long
test_AND_ceqz (long x, long y, long z, long c)
{
  if (c)
    x = y & z;
  else
    x = y;
  return x;
}

Before patch:

and a2,a1,a2
czero.eqz a0,a2,a3
czero.nez a3,a1,a3
or a0,a3,a0
ret

After patch:
and a0,a1,a2
czero.nez a1,a1,a3
or a0,a1,a0
ret

Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
(noce_bbs_ok_for_cond_zero_arith): Likewise.
(noce_try_cond_zero_arith): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: Add TCs for AND.

c++: Fix noexcept checking for trivial operations [PR96090]

This patch stops eager folding of trivial operations (construction and
assignment) from occurring when checking for noexceptness. This was
previously done in PR c++/53025, but only for copy/move construction,
and the __is_nothrow_xible builtins did not receive the same treatment
when they were added.

To handle `is_nothrow_default_constructible`, the patch also ensures
that when no parameters are passed we do value initialisation instead of
just building the constructor call: in particular, value-initialisation
doesn't necessarily actually invoke the constructor for trivial default
constructors, and so we need to handle this case as well.

This is contrary to the proposed resolution of CWG2820; for now we just
ensure it matches the behaviour of the `noexcept` operator and create
testcases formalising this, and if that issue gets accepted we can
revisit.

PR c++/96090
PR c++/100470

gcc/cp/ChangeLog:

* call.cc (build_over_call): Prevent folding of trivial special
members when checking for noexcept.
* method.cc (constructible_expr): Perform value-initialisation
for empty parameter lists.
(is_nothrow_xible): Treat as noexcept operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept81.C: New test.
* g++.dg/ext/is_nothrow_constructible7.C: New test.
* g++.dg/ext/is_nothrow_constructible8.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Clear uninstantiated template friend when instantiating [PR104234]

Otherwise attempting to get the originating module declaration ICEs
because the DECL_CHAIN of an instantiated friend template is no longer
its context.

PR c++/104234
PR c++/112580

gcc/cp/ChangeLog:

* pt.cc (tsubst_template_decl): Clear
DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr104234.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Support vpcmov for V4HF/V4BF/V2HF/V2BF under TARGET_XOP.

gcc/ChangeLog:

PR target/112904
* config/i386/mmx.md (*xop_pcmov_<mode>): New define_insn.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr112904.C: New test.

rs6000: Guard fctid on PowerPC64 and PowerPC476

fctid is only supported on 64-bit Power processors and powerpc 476. It
should be guarded by this condition. The patch fixes the issue.

gcc/
PR target/112707
* config/rs6000/rs6000.h (TARGET_FCTID): Define.
* config/rs6000/rs6000.md (lrint<mode>di2): Add guard TARGET_FCTID.
* (lround<mode>di2): Replace TARGET_FPRND with TARGET_FCTID.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707.h: New.
* gcc.target/powerpc/pr112707-2.c: New.
* gcc.target/powerpc/pr112707-3.c: New.
* gcc.target/powerpc/pr88558-p7.c: Check fctid on ilp32 and
has_arch_ppc64 as it's now guarded by powerpc64.
* gcc.target/powerpc/pr88558-p8.c: Likewise.
* gfortran.dg/nint_p7.f90: Add powerpc64 target requirement as
lround<mode>di2 is now guarded by powerpc64.

rs6000: Enable lrint<mode>si2 on old archs with stfiwx enabled

The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
but the instruction can't be generated on such platforms as the insn is
guard by TARGET_POPCNTD.  The root cause is SImode in float register is
supported from Power7.  Actually implementation of "fctiw" only needs
stfiwx which is supported by the old 32-bit processors.  This patch
enables "fctiw" expand for these processors.

gcc/
PR target/112707
* config/rs6000/rs6000.md (expand lrint<mode>si2): New.
(insn lrint<mode>si2): Rename to...
(*lrint<mode>si): ...this.
(lrint<mode>si_di): New.

gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707-1.c: New.

Daily bump.

Add some new DW_IDX_* constants

I've reimplemented the .debug_names code in GDB -- it was quite far
from being correct, and the new implementation is much closer to what
is specified by DWARF.

However, the new writer in GDB needs to emit some symbol properties,
so that the reader can be fully functional.  This patch adds a few new
DW_IDX_* constants, and tries to document the existing extensions as
well.  (My patch series add more documentation of these to the GDB
manual as well.)

include/ChangeLog
2023-12-10  Tom Tromey  <tom@tromey.com>

* dwarf2.def (DW_IDX_GNU_internal, DW_IDX_GNU_external): Comment.
(DW_IDX_GNU_main, DW_IDX_GNU_language, DW_IDX_GNU_linkage_name):
New constants.

[PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for shift
like op.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: Add tests for shift like op.

Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>

aarch64: Fix invalid subregs for BE svread/write_za

Multi-register svread_za and svwrite_za are implemented using one
pattern per register count, with the register contents being bitcast
on entry (for writes) or return (for reads).  Previously we relied
on subregs for this, with the subreg for reads being handled by
target-independent code.  But using subregs isn't correct for many
big-endian cases, where following subreg rules often requires actual
instructions.  The semantics are instead supposed to be those of
svreinterpret.

gcc/
PR target/112931
PR target/112933
* config/aarch64/aarch64-protos.h (aarch64_sve_reinterpret): Declare.
* config/aarch64/aarch64.cc (aarch64_sve_reinterpret): New function.
* config/aarch64/aarch64-sve-builtins-sme.cc (svread_za_impl::expand)
(svwrite_za_impl::expand): Use it to cast the SVE register to the
right mode.

aarch64: Fix SMSTART/SMSTOP save/restore for BE

VNx16QI (the SVE register byte mode) is the only SVE mode for which
LD1 and LDR result in the same register layout for big-endian. It is
therefore the only mode for which we allow LDR and STR to be used for
big-endian SVE moves.

The SME support sometimes needs to use LDR and STR to save and restore
Z register contents around an SMSTART/SMSTOP SM. It therefore needs
to use VNx16QI regardless of the type of value that is stored in the
Z registers.

gcc/
PR target/112930
* config/aarch64/aarch64.cc (aarch64_sme_mode_switch_regs::add_reg):
Force specific SVE modes for single registers as well as structures.

aarch64: XFAIL some SME tests for BE

The z0_z23 tests rely on being able to propagate:

  (1) set of double-register z0-z1
  (2) copy of z0 to z28
  (3) use of z28

to a use of z0.  On LE targets it's regcprop that does this.
But regcprop punts on (2) because of:

  https://gcc.gnu.org/pipermail/gcc-patches/2002-July/081990.html

This patch therefore XFAILs the affected tests.

gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/uzp_bf16_x2.c: XFAIL z0_z23 tests
for big-endian.
* gcc.target/aarch64/sme2/acle-asm/uzp_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u8_x2.c: Likewise.

aarch64: Skip some SME register save tests on BE

Big-endian targets need to save Z8-Z15 in the same order as
the registers would appear for D8-D15, because the layout is
mandated by the EH ABI. BE targets therefore use ST1D instead
of the normal STR for those registers (but not for others).

That difference is already tested elsewhere and isn't important
for the SME tests. This patch therefore restricts the affected
tests to LE.

gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_5.c: Restrict tests that
contain Z8-Z23 saves to little-endian.
* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_1.c: Likewise.

aarch64: Add -funwind-tables to some tests

The .cfi scans in these tests failed for *-elf targets because
those targets don't enable .eh_frame info by default.

gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_1.c: Add -funwind-tables.
* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.

Fortran: allow NULL() for POINTER, OPTIONAL, CONTIGUOUS dummy [PR111503]

gcc/fortran/ChangeLog:

PR fortran/111503
* expr.cc (gfc_is_simply_contiguous): Determine characteristics of
NULL() from optional MOLD argument, otherwise treat as contiguous.
* primary.cc (gfc_variable_attr): Derive attributes of NULL(MOLD)
from MOLD.

gcc/testsuite/ChangeLog:

PR fortran/111503
* gfortran.dg/contiguous_14.f90: New test.

Fortran: function returning contiguous class array [PR105543]

gcc/fortran/ChangeLog:

PR fortran/105543
* resolve.cc (resolve_symbol): For a CLASS-valued function having a
RESULT clause, ensure that attr.class_ok is set for its symbol as
well as for its resolved result variable.

gcc/testsuite/ChangeLog:

PR fortran/105543
* gfortran.dg/contiguous_13.f90: New test.

doc: small tweak

Mention Objective-C++ here to be consistent with the surrounding C/ObjC
lines.

gcc/ChangeLog:

* doc/invoke.texi (-fpermissive): Mention ObjC++ for -Wnarrowing.

c++: Implement __remove_pointer built-in trait

This patch implements built-in trait for std::remove_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_POINTER.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_pointer.
* g++.dg/ext/remove_pointer.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_object built-in trait

This patch implements built-in trait for std::is_object.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_object.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_OBJECT.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_object.
* g++.dg/ext/is_object.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_function built-in trait

This patch implements built-in trait for std::is_function.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_function.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_FUNCTION.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_function.
* g++.dg/ext/is_function.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_reference built-in trait

This patch implements built-in trait for std::is_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_reference.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_REFERENCE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_reference.
* g++.dg/ext/is_reference.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_member_object_pointer built-in trait

This patch implements built-in trait for std::is_member_object_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_object_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_OBJECT_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_object_pointer.
* g++.dg/ext/is_member_object_pointer.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_member_function_pointer built-in trait

This patch implements built-in trait for std::is_member_function_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_function_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_FUNCTION_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_function_pointer.
* g++.dg/ext/is_member_function_pointer.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_member_pointer built-in trait

This patch implements built-in trait for std::is_member_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_pointer.
* g++.dg/ext/is_member_pointer.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_scoped_enum built-in trait

This patch implements built-in trait for std::is_scoped_enum.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_scoped_enum.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_SCOPED_ENUM.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_scoped_enum.
* g++.dg/ext/is_scoped_enum.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_bounded_array built-in trait

This patch implements built-in trait for std::is_bounded_array.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_bounded_array.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_BOUNDED_ARRAY.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_bounded_array.
* g++.dg/ext/is_bounded_array.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Implement __is_array built-in trait

This patch implements built-in trait for std::is_array.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_array.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARRAY.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_array.
* g++.dg/ext/is_array.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: trait patch tweak

As Patrick suggested elsewhere, let's move this into the default case.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Move trait
handling to default label.

c++: Accept the use of built-in trait identifiers

This patch accepts the use of built-in trait identifiers when they are
actually not used as traits.  Specifically, we check if the subsequent
token is '(' for ordinary built-in traits or is '<' only for the special
__type_pack_element built-in trait.  If those identifiers are used
differently, the parser treats them as normal identifiers.  This allows
us to accept code like: struct __is_pointer {};.

gcc/cp/ChangeLog:

* parser.cc (cp_lexer_lookup_trait): Rename to ...
(cp_lexer_peek_trait): ... this.  Handle a subsequent token for
the corresponding built-in trait.
(cp_lexer_lookup_trait_expr): Rename to ...
(cp_lexer_peek_trait_expr): ... this.
(cp_lexer_lookup_trait_type): Rename to ...
(cp_lexer_peek_trait_type): ... this.
(cp_lexer_next_token_is_decl_specifier_keyword): Call
cp_lexer_peek_trait_type.
(cp_parser_simple_type_specifier): Likewise.
(cp_parser_primary_expression): Call cp_lexer_peek_trait_expr.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c-family, c++: Look up built-in traits via identifier node

Since RID_MAX soon reaches 255 and all built-in traits are used
approximately once in a C++ translation unit, this patch removes
all RID values for built-in traits and uses the identifier node to
look up the specific trait.  Rather than holding traits as keywords,
we set all trait identifiers as cik_trait, which is a new
cp_identifier_kind.  As cik_reserved_for_udlit was unused and
cp_identifier_kind is 3 bits, we replaced the unused field with the new
cik_trait.  Also, the later patch handles a subsequent token to the
built-in identifier so that we accept the use of non-function-like
built-in trait identifiers.

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Remove all mappings of
built-in traits.
* c-common.h (enum rid): Remove all RID values for built-in
traits.

gcc/cp/ChangeLog:

* cp-objcp-common.cc (names_builtin_p): Remove all RID value
cases for built-in traits.  Check for built-in traits via
the new cik_trait kind.
* cp-tree.h (enum cp_trait_kind): Set its underlying type to
addr_space_t.
(struct cp_trait): New struct to hold trait information.
(cp_traits): New array to hold a mapping to all traits.
(cik_reserved_for_udlit): Rename to ...
(cik_trait): ... this.
(IDENTIFIER_ANY_OP_P): Exclude cik_trait.
(IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
* lex.cc (cp_traits): Define its values, declared in cp-tree.h.
(init_cp_traits): New function to set cik_trait and
IDENTIFIER_CP_INDEX for all built-in trait identifiers.
(cxx_init): Call init_cp_traits function.
* parser.cc (cp_lexer_lookup_trait): New function to look up a
built-in trait by IDENTIFIER_CP_INDEX.
(cp_lexer_lookup_trait_expr): Likewise, look up an
expression-yielding built-in trait.
(cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
built-in trait.
(cp_keyword_starts_decl_specifier_p): Remove all RID value cases
for built-in traits.
(cp_lexer_next_token_is_decl_specifier_keyword): Handle
type-yielding built-in traits.
(cp_parser_primary_expression): Remove all RID value cases for
built-in traits.  Handle expression-yielding built-in traits.
(cp_parser_trait): Handle cp_trait instead of enum rid.
(cp_parser_simple_type_specifier): Remove all RID value cases
for built-in traits.  Handle type-yielding built-in traits.

Co-authored-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

c++: Sort built-in traits alphabetically

This patch sorts built-in traits alphabetically for better code
readability.

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Sort built-in traits
alphabetically.
* cp-trait.def: Likewise.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
(finish_trait_type): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Sort built-in traits alphabetically.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

[committed] Support uaddv and usubv on the H8

This patch adds uaddv/usubv support on the H8 port to speed up those pesky
builtin-overflow tests.  It's a variant of something I'd been running for a
while -- the major change between the old approach I'd been using and this
patch is this version does not expose the CC register until after reload to be
consistent with the rest of the H8 port.

The general approach is to first clear the GPR that's going to hold the
overflow status, perform the arithmetic operation (add/sub), then use addx to
move the overflow indicator (in the C bit) into the GPR holding the overflow
status.

That's a significant improvement over the mess of logicals that's generated by
the generic code.

Handling signed overflow is possible and something I'll probably port to this
scheme at some point.  It's a bit more complex because we can't trivially move
the bit from CCR into the right position in a GPR and other quirks of the H8.

This has been regression tested on the H8 without problems.  Pushing to the
trunk.

gcc/
* config/h8300/addsub.md (uaddv<mode>4, usubv<mode>4): New expanders.
(uaddv): New define_insn_and_split plus post-reload pattern.

[committed] Provide patterns for signed bitfield extractions on H8

Inspired by Roger's work on the ARC port, this patch provides a
define_and_split pattern to optimize sign extended bitfields starting at
position 0 using an approach that doesn't require shifting.

It then builds on that to provide another define_and_split pattern to support
arbitrary signed bitfield extractions -- it uses a right logical shift to move
the bitfield into position 0, then the specialized pattern above to sign extend
the MSB of the field through the rest of the register.

This is often, but certainly not always, better than a two shift approach. The
code uses the sizes of the sequences to select between the two shift approach
and single shift with extension from an arbitrary location approach.

There's certainly further improvements that could be made here, but I think
we're getting the bulk of the improvements already.

Regression tested on the H8 port without errors. Installing on the trunk.

gcc/
* config/h8300/h8300-protos.h (use_extvsi): Prototype.
* config/h8300/combiner.md: Two new define_insn_and_split patterns
to implement signed bitfield extractions.
* config/h8300/h8300.cc (use_extvsi): New function.

[committed] Fix length computation of single bit bitfield extraction on H8

Various approaches are used to optimize extracting a sign extended single bit
bitfield. The length computation of 10 bytes was conservatively correct, but
inaccurate.

In particular when the bit we want is in the low half word we don't need the
move high half to low half instruction. Account for that in the length
computation.

This was spotted when looking at regressions in the generalized signed bitfield
extraction pattern.

This has been regression tested on the H8 port.

gcc/
* config/h8300/combiner.md (single bit signed bitfield extraction): Fix
length computation when the bit we want is in the low half word.

[committed] Fix length computation for logical shifts on H8

This fixes the length computation for logical shifts on the H8/SX.

The H8/SX has a richer set of logical shifts compared to early parts in the H8
family. It has special 2 byte instructions for shifts by power of two
immediate values as well as a special 4 byte shift by other immediate values.

These were never accounted for (AFIACT) in the length computation for shifts.
Until now that's mostly just affected branch shortening. But an upcoming patch
uses instruction lengths to select between two potential sequences and getting
these lengths wrong will cause it to miss optimization opportunities on the
H8/SX.

gcc
* config/h8300/h8300.cc (compute_a_shift_length): Fix computation
of logical shifts on the H8/SX.

Daily bump.

phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]

This function is never called when param_l1_cache_line_size is 0,
but it uses int and unsigned int variables to hold alignment in
bits, so for large param_l1_cache_line_size it is zero and e.g.
DECL_ALIGN () % param_align_bits can divide by zero.
Looking at the code, the function uses tree_fits_uhwi_p on the trees
before converting them using tree_to_uhwi to int variables, which
looks just wrong, either it would need to punt if it doesn't fit
into those and also check for overflows during the computation,
or use unsigned HOST_WIDE_INT for all of this. That also fixes
the division by zero, as param_l1_cache_line_size maximum is INT_MAX,
that multiplied by 8 will always fit.

2023-12-09 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/112887
* tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of
param_align, param_align_bits, offset1, offset2, size2 and align1
variables from int or unsigned int to unsigned HOST_WIDE_INT.

* gcc.dg/pr112887.c: New test.

testsuite: Add testcase for already fixed PR [PR112924]

This testcase got fixed with
r14-6132-g50f2a3370d177f8fe9bea0461feb710523e048a2 .
I'm just adding a testcase so that it doesn't reappear.

2023-12-09 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/112924
* gcc.dg/pr112924.c: New test.

libstdc++: Fix value of __cpp_lib_format macro [PR111826]

As noted in the PR, we support both features required for the 202110L
value, so we should define it with that value.

libstdc++-v3/ChangeLog:

PR libstdc++/111826
* include/bits/version.def (format): Update value.
* include/bits/version.h: Regenerate.
* testsuite/std/format/functions/format.cc:

libstdc++: Fix resolution of LWG 4016 for std::ranges::to [PR112876]

What I implemented in r14-6199-g45630fbcf7875b does not match what I
proposed for LWG 4016, and it imposes additional, unwanted requirements
on the emplace and insert member functions of the container being
populated.

libstdc++-v3/ChangeLog:

PR libstdc++/112876
* include/std/ranges (ranges::to): Do not try to use an iterator
returned by the container's emplace or insert member functions.
* testsuite/std/ranges/conv/1.cc (Cont4::emplace, Cont4::insert):
Use the iterator parameter. Do not return an iterator.

driver: Fix memory leak [PR93019]

driver:finalize used by JIT clears the mdswitches pointer; if it was
allocated before, that leaks the memory.

2023-12-09 Costas Argyris <costas.argyris@gmail.com>
Jakub Jelinek <jakub@redhat.com>

PR driver/93019
* gcc.cc (driver::finalize): Call XDELETEVEC on mdswitches before
clearing it.

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>

c++: Don't diagnose ignoring of attributes if all ignored attributes are attribute_ignored_p

There is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?

I've kept warnings for cases where the C++ standard says explicitly any
attributes aren't ok -
"If an attribute-specifier-seq appertains to a friend declaration, that
declaration shall be a definition."
or
https://eel.is/c++draft/dcl.type.elab#3
or
https://eel.is/c++draft/temp.spec#temp.explicit-3

For some changes I haven't figured out how could I cover it in the
testsuite.

Note, C uses a different strategy, it has c_warn_unused_attributes
function which warns about all the attributes one by one unless they
are ignored (or allowed in certain position).
Though that is just a single diagnostic wording, while C++ FE just warns
that there are some ignored attributes and doesn't name them individually
(except for namespace and using namespace) and uses different wordings in
different spots.

2023-12-09 Jakub Jelinek <jakub@redhat.com>

gcc/
* attribs.h (any_nonignored_attribute_p): Declare.
* attribs.cc (any_nonignored_attribute_p): New function.
gcc/cp/
* parser.cc (cp_parser_statement, cp_parser_expression_statement,
cp_parser_declaration, cp_parser_asm_definition): Don't diagnose
ignored attributes if !any_nonignored_attribute_p.
* decl.cc (grokdeclarator): Likewise.
* name-lookup.cc (handle_namespace_attrs, finish_using_directive):
Don't diagnose ignoring of attr_ignored_p attributes.
gcc/testsuite/
* g++.dg/warn/Wno-attributes-1.C: New test.

RISC-V: Fix VLS mode movmiaslign bug

PR112932 let me notice there is a bug of current VLS mode misalign pattern.
Adapt it same as VLA mode.

Commited as it is obvious fix.

PR target/112932

gcc/ChangeLog:

* config/riscv/vector.md (movmisalign<mode>): Fix VLSmode bugs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr112932.c: New test.

testsuite: Remove gcc.dg/tree-ssa/scev-3.c -4.c and 5.c

These tests were recently xfailed on ilp32 targets though
passing on almost all ilp32 targets (known exceptions: ia32
and some arm subtargets). They've been changed around too
much to remain useful.

PR testsuite/112786
* gcc.dg/tree-ssa/scev-3.c, gcc.dg/tree-ssa/scev-4.c,
gcc.dg/tree-ssa/scev-5.c: Remove.

strub: skip emutls after strubm errors

The emutls pass requires PROP_ssa, but if the strubm pass (or any
other pre-SSA pass) issues errors, all of the build_ssa_passes are
skipped, so the property is not set, but emutls still attempts to run,
on targets that use it, despite earlier errors, so it hits the
unsatisfied requirement.

Adjust emutls to be skipped in case of earlier errors.

for gcc/ChangeLog

* tree-emutls.cc: Include diagnostic-core.h.
(pass_ipa_lower_emutls::gate): Skip if errors were seen.

Daily bump.

c++: decltype of (non-captured variable) [PR83167]

For decltype((x)) within a lambda where x is not captured, we dubiously
require that the lambda has a capture default, unlike for decltype(x).
But according to [expr.prim.id.unqual]/3 we should just ignore the lambda
in this case.  This patch narrowly fixes this issue by disabling the
capture_decltype handling and falling back to the ordinary handling when
the innermost lambda has no capture-default.  In fact, we can restrict
the special handling to only by-copy lambdas since that's what
[expr.prim.id.unqual]/3 is concerned with; for by-ref implicit captures
both code paths should give the same result anyway.

During review some other issues were discovered which are documented in
a new FIXME.

PR c++/83167

gcc/cp/ChangeLog:

* semantics.cc (capture_decltype): Inline into its only caller ...
(finish_decltype_type): ... here.  Update nearby comment to refer
to recent standard.  Add FIXME.  Restrict uncaptured variable type
transformation to happen only for lambdas with a by-copy
capture-default.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype4.C: New test.

analyzer: avoid taint for (TAINTED % NON_TAINTED)

gcc/analyzer/ChangeLog:
* sm-taint.cc (taint_state_machine::alt_get_inherited_state): Fix
handling of TRUNC_MOD_EXPR.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/taint-modulus-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix ICE on infoleak with poisoned size

gcc/analyzer/ChangeLog:
* region-model.cc (contains_uninit_p): Only check for
svalues that the infoleak warning can handle.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/infoleak-uninit-size-1.c: New test.
* gcc.dg/plugin/infoleak-uninit-size-2.c: New test.
* gcc.dg/plugin/plugin.exp: Add the new tests.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[PR112875][LRA]: Fix an assert in lra elimination code

PR112875 test ran into a wrong assert (gcc_unreachable) in elimination
in a debug insn. The insn seems ok. So I change the assertion.
To be more accurate I made it the same as analogous reload pass code.

gcc/ChangeLog:

PR rtl-optimization/112875
* lra-eliminations.cc (lra_eliminate_regs_1): Change an assert.
Add ASM_OPERANDS case.

gcc/testsuite/ChangeLog:

PR rtl-optimization/112875
* gcc.target/i386/pr112875.c: New test.

c++: Fix parsing [[]][[]];

When working on the previous patch I put [[]] [[]] asm (""); into a
testcase, but was surprised it wasn't parsed.
The problem is that when cp_parser_std_attribute_spec returns NULL, it
can mean 2 different things, one is that the next token(s) are neither
[[ nor alignas (in that case the caller should break from the loop),
or when we parsed something like [[]] - it was valid attribute specifier,
but didn't specify any attributes in it.

The following patch fixes that by using a magic value of void_list_node
for the case where the first tokens are neither [[ nor alignas and so
where cp_parser_std_attribute_spec_seq should stop iterating to differentiate
it from NULL_TREE which stands for some attribute specifier has been parsed,
but it didn't contain any (or any valid) attributes.

2023-12-08 Jakub Jelinek <jakub@redhat.com>

* parser.cc (cp_parser_std_attribute_spec): Return void_list_node
rather than NULL_TREE if token is neither CPP_OPEN_SQUARE nor
RID_ALIGNAS CPP_KEYWORD.
(cp_parser_std_attribute_spec_seq): For attr_spec == void_list_node
break, for attr_spec == NULL_TREE continue.

* g++.dg/cpp0x/gen-attrs-79.C: New test.

c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]

The following testcase is miscompiled because two ubsan instrumentations
run into each other.
The first one is the shift instrumentation.  Before the C++ FE calls
it, it wraps the 2 shift arguments with cp_save_expr, so that side-effects
in them aren't evaluated multiple times.  And, ubsan_instrument_shift
itself uses unshare_expr on any uses of the operands to make sure further
modifications in them don't affect other copies of them (the only not
unshared ones are the one the caller then uses for the actual operation
after the instrumentation, which means there is no tree sharing).

Now, if there are side-effects in the first operand like say function
call, cp_save_expr wraps it into a SAVE_EXPR, and ubsan_instrument_shift
in this mode emits something like
if (..., SAVE_EXPR <foo ()>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <foo ()>, ...);
and caller adds
SAVE_EXPR <foo ()> << SAVE_EXPR <op1>
after it in a COMPOUND_EXPR.  So far so good.

If there are no side-effects and cp_save_expr doesn't create SAVE_EXPR,
everything is ok as well because of the unshare_expr.
We have
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., ptr->something[i], ...);
and
ptr->something[i] << SAVE_EXPR <op1>
where ptr->something[i] is unshared.

In the testcase below, the !x->s[j] ? 1 : 0 expression is wrapped initially
into a SAVE_EXPR though, and unshare_expr doesn't unshare SAVE_EXPRs nor
anything used in them for obvious reasons, so we end up with:
if (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, ...);
and
SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0> << SAVE_EXPR <op1>
So far good as well.  But later during cp_fold of the SAVE_EXPR we find
out that VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1 is actually
invariant (has TREE_READONLY set) and so cp_fold simplifies the above to
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., (bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1, ...);
and
((bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1) << SAVE_EXPR <op1>
with the s[j] ARRAY_REFs and other expressions shared in between the two
uses (and obviously the expression optimized away from the COMPOUND_EXPR in
the if condition.

Then comes another ubsan instrumentation at genericization time,
this time to instrument the ARRAY_REFs with strict bounds checking,
and replaces the s[j] in there with s[.UBSAN_BOUNDS (0B, SAVE_EXPR<j>, 8), SAVE_EXPR<j>]
As the trees are shared, it does that just once though.
And as the if body is gimplified first, the SAVE_EXPR<j> is evaluated inside
of the if body and when it is used again after the if, it uses a potentially
uninitialized value of j.1 (always uninitialized if the shift count isn't
out of bounds).

The following patch fixes that by unshare_expr unsharing the folded argument
of a SAVE_EXPR if we've folded the SAVE_EXPR into an invariant and it is
used more than once.

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/112727
* cp-gimplify.cc (cp_fold): If SAVE_EXPR has been previously
folded, unshare_expr what is returned.

* c-c++-common/ubsan/pr112727.c: New test.

c++: Add fixed test [PR88848]

This one was fixed by r12-7714-g47da5198766256.

PR c++/88848

gcc/testsuite/ChangeLog:

* g++.dg/inherit/multiple2.C: New test.

c++: guard more against undiagnosed error_mark_node [PR112658]

This adds a sanity check to cp_parser_expression_statement similar to
the one in finish_expr_stmt added by r6-6795-g0fd9d4921f7ba2, which
effectively downgrades accepts-invalid/wrong-code bugs like this one
into ice-on-invalid/ice-on-valid ones.

PR c++/112658

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): If the statement
is error_mark_node, make sure we've seen_error().

c++: undiagnosed error_mark_node from cp_build_c_cast [PR112658]

When cp_build_c_cast commits to an erroneous const_cast, we neglect to
replay errors from build_const_cast_1 which can lead to us incorrectly
accepting (and "miscompiling") the cast, or triggering the assert in
finish_expr_stmt.

This patch fixes this oversight.  This was the original fix for the ICE
in PR112658 before r14-5941-g305a2686c99bf9 made us accept the testcase
there after all.  I wasn't able to come up with an alternate testcase for
which this fix has an effect anymore, but below is a reduced version of
the PR112658 testcase (accepted ever since r14-5941) for good measure.

PR c++/112658
PR c++/94264

gcc/cp/ChangeLog:

* typeck.cc (cp_build_c_cast): If we're committed to a const_cast
and the result is erroneous, call build_const_cast_1 a second
time to issue errors.  Use complain=tf_none instead of =false.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-array20.C: New test.

RISC-V: Add vectorized strcmp and strncmp.

This patch adds vectorized strcmp and strncmp implementations and
tests. Similar to strlen, expansion is still guarded by
-minline-str(n)cmp.

gcc/ChangeLog:

PR target/112109

* config/riscv/riscv-protos.h (expand_strcmp): Declare.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
strategy handling and delegation to scalar and vector expanders.
(expand_strcmp): Vectorized implementation.
* config/riscv/riscv.md: Add TARGET_VECTOR to strcmp and strncmp
expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strncmp.c: New test.

RISC-V: Add vectorized strlen.

This patch implements a vectorized strlen by re-using and slightly
adjusting the rawmemchr implementation. Rawmemchr returns the address
of the needle while strlen returns the difference between needle address
and start address.

As before, strlen expansion is guarded by -minline-strlen.

While testing with -minline-strlen I encountered a vsetvl problem in
memcpy-chk.c where we didn't insert a vsetvl at the proper spot (after
a setjmp). This needs to be fixed separately and I figured I'd post
this patch as-is.

gcc/ChangeLog:

PR target/112109

* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
parameter.
* config/riscv/riscv-string.cc (riscv_expand_strlen): Call
rawmemchr.
(expand_rawmemchr): Add strlen handling.
* config/riscv/riscv.md: Add TARGET_VECTOR to strlen expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.

aarch64: Some tweaks to the early-ra pass

early-ra's likely_operand_match_p didn't handle relaxed and special
memory constraints, which meant that the pass wasn't able to match
LD1RQ instructions to their constraints, and so backed out of
trying to allocate.  This patch fixes that by switching the sense
of the match: does the rtx seem appropriate for the constraint?,
rather than: does the constraint seem appropriate for the rtx?

Also, I came across a case that needed more general equivalence
detection.  Previously we would only record equivalences after
the last definition of the source register, but it's worth trying
to handle cases where the destination register's live range is
restricted to a block, and the next definition of the source
occurs only after the end of the destination register's live range.

The patch also fixes a cut-&-pasto that Alex noticed (thanks).

gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::chain_next):
Put into an enum with...
(allocno_info::last_def_point): ...new member variable.
(allocno_info::m_current_bb_point): New member variable.
(likely_operand_match_p): Switch based on get_constraint_type,
rather than based on rtx code.  Handle relaxed and special memory
constraints.
(early_ra::record_copy): Allow the source of an equivalence to be
assigned to more than once.
(early_ra::record_allocno_use): Invalidate any previous equivalence.
Initialize last_def_point.
(early_ra::record_allocno_def): Set last_def_point.
(early_ra::valid_equivalence_p): New function, split out from...
(early_ra::record_copy): ...here.  Use last_def_point to handle
source registers that have a later definition.
(make_pass_aarch64_early_ra): Fix comment.

gcc/testsuite/
* gcc.target/aarch64/sme/strided_2.c: New test.

Revert "arm: vld1q_types_x2 ACLE intrinsics"

This reverts commit a1a0cdf21bb6a076e98658d815645d8ad1193840.

Revert "arm: vld1q_types_x3 ACLE intrinsics"

This reverts commit 2514a331835e055a963fd059dc5770e5ae500af0.

Revert "arm: vld1q_types_x4 ACLE intrinsics"

This reverts commit ac827ec3e600bcb636f564876b186ee19d384a1e.

Revert "arm: vst1_types_x2 ACLE intrinsics"

This reverts commit a69a7c7b6782c5b6f213f1f34af8dbb6541f27bb.

Revert "arm: vst1_types_x3 ACLE intrinsics"

This reverts commit ef07ae652c25ec04c2e3ef8cec14b0771a809861.

Revert "arm: vst1_types_x4 ACLE intrinsics"

This reverts commit 2f48d846c794ba091b266133f73717361096d454.

Revert "arm: vst1q_types_x2 ACLE intrinsics"

This reverts commit 2cd0d0261ef9d0e13e20407f131f32dcb67fcdd3.

Revert "arm: vst1q_types_x3 ACLE intrinsics"

This reverts commit 2d58d53c9e0eed83faa9254f8d3ec0ddd54812d8.

Revert "arm: vst1q_types_x4 ACLE intrinsics"

This reverts commit 4ad77f883c178679f1dbb3a5603f811e022080bb.

Revert "arm: vld1_types_x2 ACLE intrinsics"

This reverts commit 8fff3f065277f13176c320f22c4ed766a82c5d8e.

Revert "arm: vld1_types_x3 ACLE intrinsics"

This reverts commit 8e3ae874b21bdd8da32afefa6f6f60913481564c.

Revert "arm: vld1_types_x4 ACLE intrinsics"

This reverts commit 656f092cba951fddc1e40468ad71d241ffe98566.

libgcov: Call __builtin_fork instead of fork

Some targets do not provide a prototype for fork, and compilation now
fails with an implicit-function-declaration error.

libgcc/

* libgcov-interface.c (__gcov_fork): Use __builtin_fork instead
of fork.

OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.

With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.

In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.

Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
* builtins.cc (builtin_fnspec): Handle it.
* gimple-ssa-warn-access.cc (fndecl_alloc_p,
matching_alloc_calls_p): Likewise.
* gimple.cc (nonfreeing_call_p): Likewise.
* predict.cc (expr_expected_value_1): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
* tree.cc (fndecl_dealloc_argno): Likewise.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
and EXEC_OMP_ALLOCATORS.
* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
(-fopenmp-simd): New, based on C version.
(-fopenmp-allocators): New.
* lang.opt (fopenmp-allocators): Add.
* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
add target and no dynamic_allocators diagnostic and more invalid
diagnostic.
* parse.cc (decode_omp_directive): Set contains_teams_construct.
* trans-array.h (gfc_array_allocate): Update prototype.
(gfc_conv_descriptor_version): New prototype.
* trans-decl.cc (gfc_init_default_dt): Fix comment.
* trans-array.cc (gfc_conv_descriptor_version): New.
(gfc_array_allocate): Support GOMP_alloc allocation.
(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
Handle GOMP_free/omp_realloc as needed.
* trans-expr.cc (gfc_conv_procedure_call): Likewise.
(alloc_scalar_allocatable_for_assignment): Likewise.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
* trans-openmp.cc (gfc_trans_omp_allocators,
gfc_trans_omp_directive): Handle allocators/allocate directive.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
* trans-stmt.h (gfc_trans_allocate): Update prototype.
* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
Update to handle GOMP_alloc.
(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
Handle GOMP_free.
(trans_code): Update call.
* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
Update prototype.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.

libgomp/ChangeLog:

* allocator.c (struct fort_alloc_splay_tree_key_s,
fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
(GOMP_5.1.1): ... here.
* libgomp.texi (Impl. Status, Memory management): Update for
allocators/allocate directives.
* splay-tree.c: Handle splay_tree_static define to declare all
functions as static.
(splay_tree_lookup_node): New.
* splay-tree.h: Handle splay_tree_decl_only define.
(splay_tree_lookup_node): New prototype.
* target.c: Define splay_tree_static for 'reverse'.
* testsuite/libgomp.fortran/allocators-1.f90: New test.
* testsuite/libgomp.fortran/allocators-2.f90: New test.
* testsuite/libgomp.fortran/allocators-3.f90: New test.
* testsuite/libgomp.fortran/allocators-4.f90: New test.
* testsuite/libgomp.fortran/allocators-5.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-14.f90: Add coarray and
not-listed tests.
* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
* gfortran.dg/bind_c_array_params_2.f90: Update expected
dump for dtype '.version=0'.
* gfortran.dg/gomp/allocate-16.f90: New test.
* gfortran.dg/gomp/allocators-3.f90: New test.
* gfortran.dg/gomp/allocators-4.f90: New test.

libgcc: Fix config.in

It was updated incorrectly in

  commit dbbfb52b0e9c66ee9d05b8fd17c4f44655e48463
  Author:     Szabolcs Nagy <szabolcs.nagy@arm.com>
  CommitDate: 2023-12-08 11:29:06 +0000

    libgcc: aarch64: Configure check for __getauxval

so regenerate it.

libgcc/ChangeLog:

* config.in: Regenerate.

libgcc: aarch64: Add SME unwinder support

To support the ZA lazy save scheme, the PCS requires the unwinder to
reset the SME state to PSTATE.SM=0, PSTATE.ZA=0, TPIDR2_EL0=0 on entry
to an exception handler. We use the __arm_za_disable SME runtime call
unconditionally to achieve this.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#exceptions

The hidden alias is used to avoid a PLT and avoid inconsistent VPCS
marking (we don't rely on special PCS at the call site). In case of
static linking the SME runtime init code is linked in code that raises
exceptions.

libgcc/ChangeLog:

* config/aarch64/__arm_za_disable.S: Add hidden alias.
* config/aarch64/aarch64-unwind.h: Reset the SME state before
EH return via the _Unwind_Frames_Extra hook.

libgcc: aarch64: Add SME runtime support

The call ABI for SME (Scalable Matrix Extension) requires a number of
helper routines which are added to libgcc so they are tied to the
compiler version instead of the libc version. See
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines

The routines are in shared libgcc and static libgcc eh, even though
they are not related to exception handling. This is to avoid linking
a copy of the routines into dynamic linked binaries, because TPIDR2_EL0
block can be extended in the future which is better to handle in a
single place per process.

The support routines have to decide if SME is accessible or not. Linux
tells userspace if SME is accessible via AT_HWCAP2, otherwise a new
__aarch64_sme_accessible symbol was introduced that a libc can define.
Due to libgcc and libc build order, the symbol availability cannot be
checked so for __aarch64_sme_accessible an unistd.h feature test macro
is used while such detection mechanism is not available for __getauxval
so we rely on configure checks based on the target triplet.

Asm helper code is added to make writing the routines easier.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Add sources to the build.
* config/aarch64/__aarch64_have_sme.c: New file.
* config/aarch64/__arm_sme_state.S: New file.
* config/aarch64/__arm_tpidr2_restore.S: New file.
* config/aarch64/__arm_tpidr2_save.S: New file.
* config/aarch64/__arm_za_disable.S: New file.
* config/aarch64/aarch64-asm.h: New file.
* config/aarch64/libgcc-sme.ver: New file.

libgcc: aarch64: Configure check for __getauxval

Add configure check for the __getauxval ABI symbol, which is always
available on aarch64 glibc, and may be available on other linux C
runtimes. For now only enabled on glibc, others have to override it

target_configargs=libgcc_cv_have___getauxval=yes

This is deliberately obscure as it should be auto detected, ideally
via a feature test macro in unistd.h (link time detection is not
possible since the libc may not be installed at libgcc build time),
but currently there is no such feature test mechanism.

Without __getauxval, libgcc cannot do runtime CPU feature detection
and has to assume only the build time known features are available.

libgcc/ChangeLog:

* config.in: Undef HAVE___GETAUXVAL.
* configure: Regenerate.
* configure.ac: Check for __getauxval.

libgcc: aarch64: Configure check for .variant_pcs support

Ideally SME support routines in libgcc are marked as variant PCS symbols
so check if as supports the directive.

libgcc/ChangeLog:

* config.in: Undef HAVE_AS_VARIANT_PCS.
* configure: Regenerate.
* configure.ac: Check for .variant_pcs.

tree-optimization/112909 - uninit diagnostic with abnormal copy

The following avoids spurious uninit diagnostics for SSA name
copies which mostly appear when the source is marked as abnormal
which prevents copy propagation.

To prevent regressions I remove the bail out for anonymous SSA
names in the PHI arg place from warn_uninitialized_phi leaving
that to warn_uninit where I handle SSA copies from a SSA name
which isn't anonymous. In theory this might cause more
valid and false positive diagnostics to pop up.

PR tree-optimization/112909
* tree-ssa-uninit.cc (find_uninit_use): Look through a
single level of SSA name copies with single use.

* gcc.dg/uninit-pr112909.c: New testcase.

Revert "testsuite: require avx_runtime for some tests"

This reverts commit 249404649d26f544d1ad6808625807532c2b6a42.

LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly.

loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are not supported
in gcc, it causes an ICE:

ice.c:55:1: error: unrecognizable insn:
   55 | }
      | ^
(insn 63 62 64 8 (set (reg:V4DI 278)
        (subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1
     (nil))
during RTL pass: vregs
ice.c:55:1: internal compiler error: in extract_insn, at recog.cc:2804

Last time, Ruoyao has fixed a similar ICE:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636156.html

This patch fixes ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG as much as possible
to avoid the same ice happening again.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const): Use
simplify_gen_subreg instead of gen_rtx_SUBREG.
(loongarch_expand_vec_perm_const_2): Ditto.
(loongarch_expand_vec_cond_expr): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr112476-3.c: New test.
* gcc.target/loongarch/pr112476-4.c: New test.

LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611]

For [x]vshuf instructions, if the index value in the selector exceeds 63, it triggers
undefined behavior on LA464, but not on LA664. To ensure compatibility of these two
tests on both LA464 and LA664, we have modified both tests to ensure that the index
value in the selector does not exceed 63.

gcc/testsuite/ChangeLog:

PR target/112611
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Sure index less than 64.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Ditto.

LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.

Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor):
If m_has_recip is true, uf return 1.
(loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence.

LoongArch: New options -mrecip and -mrecip= with ffast-math.

When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division, square
root and reciprocal square root operations, for a better performance.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lasx.md (div<mode>3): New expander.
(*div<mode>3): Rename.
(sqrt<mode>2): New expander.
(*sqrt<mode>2): Rename.
(rsqrt<mode>2): New expander.
* config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.cc (loongarch_option_override_internal): Set
recip_mask for -mrecip and -mrecip= options.
(loongarch_emit_swrsqrtsf): New function.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, RECIP_MASK_SQRT
RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_VEC_RSQRT
RECIP_MASK_ALL): New bitmasks.
(TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, TARGET_RECIP_VEC_DIV
TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests.
* config/loongarch/loongarch.md (sqrt<mode>2): New expander.
(*sqrt<mode>2): Rename.
(rsqrt<mode>2): New expander.
* config/loongarch/loongarch.opt (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lsx.md (div<mode>3): New expander.
(*div<mode>3): Rename.
(sqrt<mode>2): New expander.
(*sqrt<mode>2): Rename.
(rsqrt<mode>2): New expander.
* config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate.
* doc/invoke.texi (LoongArch Options): Document new options.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/divf.c: New test.
* gcc.target/loongarch/recip-divf.c: New test.
* gcc.target/loongarch/recip-sqrtf.c: New test.
* gcc.target/loongarch/sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test.

LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable
[x]vfrecip instructions to be generated during auto-vectorization.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrecip_<flasxfmt>): Renamed to ..
(recip<mode>3): .. this.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine
to new pattern name.
(CODE_FOR_lsx_vfrecip_s): Ditto.
(CODE_FOR_lasx_xvfrecip_d): Ditto.
(CODE_FOR_lasx_xvfrecip_s): Ditto.
(loongarch_expand_builtin_direct): For the vector recip instructions, construct a
temporary parameter const1_vector.
* config/loongarch/lsx.md (lsx_vfrecip_<flsxfmt>): Renamed to ..
(recip<mode>3): .. this.
* config/loongarch/predicates.md (const_vector_1_operand): New predicate.

LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt<mode>2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrsqrt_<flasxfmt>): Renamed to ..
(rsqrt<mode>2): .. this.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name.
(CODE_FOR_lsx_vfrsqrt_s): Ditto.
(CODE_FOR_lasx_xvfrsqrt_d): Ditto.
(CODE_FOR_lasx_xvfrsqrt_s): Ditto.
* config/loongarch/loongarch.cc (use_rsqrt_p): New function.
(loongarch_optab_supported_p): Ditto.
(TARGET_OPTAB_SUPPORTED_P): New hook.
* config/loongarch/loongarch.md (*rsqrt<mode>a): Remove.
(*rsqrt<mode>2): New insn pattern.
(*rsqrt<mode>b): Remove.
* config/loongarch/lsx.md (lsx_vfrsqrt_<flsxfmt>): Renamed to ..
(rsqrt<mode>2): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test.

LoongArch: Add support for LoongArch V1.1 approximate instructions.

This patch adds define_insn/builtins/intrinsics for these instructions, and add option
-mfrecipe to control instruction generation.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
* config/loongarch/lasx.md (lasx_xvfrecipe_<flasxfmt>): New insn pattern.
(lasx_xvfrsqrte_<flasxfmt>): Ditto.
* config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic.
(__lasx_xvfrecipe_d): Ditto.
(__lasx_xvfrsqrte_s): Ditto.
(__lasx_xvfrsqrte_d): Ditto.
* config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates.
(LSX_EXT_BUILTIN): New macro.
(LASX_EXT_BUILTIN): Ditto.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
* config/loongarch/loongarch-c.cc: Add builtin macro "__loongarch_frecipe".
* config/loongarch/loongarch-def.cc: Regenerate.
* config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate.
* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE.
* config/loongarch/loongarch.md (loongarch_frecipe_<fmt>): New insn pattern.
(loongarch_frsqrte_<fmt>): Ditto.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/lsx.md (lsx_vfrecipe_<flsxfmt>): New insn pattern.
(lsx_vfrsqrte_<flsxfmt>): Ditto.
* config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic.
(__lsx_vfrecipe_d): Ditto.
(__lsx_vfrsqrte_s): Ditto.
(__lsx_vfrsqrte_d): Ditto.
* doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test.

Shrink out-of-SSA dump

The following removes the second GIMPLE function dump after
remove_ssa_form which used to rewrite the IL with the coalescing
result but doesn't do so since a long time now.

* tree-outof-ssa.cc (rewrite_out_of_ssa): Dump GIMPLE once only,
after final IL adjustments.

RISC-V: Fix ICE for incorrect mode attr in V_F2DI_CONVERT_BRIDGE

The mode attr V_F2DI_CONVERT_BRIDGE converts the floating-point mode
to the widden floating-point by design. But we take (RVVM1HF "RVVM2SI") by
mistake.

This patch would like to fix it by replacing the
(RVVM1HF "RVVM2SI") to (RVVM1HF "RVVM2SF") as design.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Replace RVVM2SI to RVVM2SF
for mode attr V_F2DI_CONVERT_BRIDGE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-ice-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

LoongArch: Add support for xorsign.

This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle xorsign.

On LoongArch64, floating-point registers and vector registers share the same register,
so this patch also allows conversion between LSX vector mode and scalar fp mode to
avoid unnecessary instruction generation.

gcc/ChangeLog:

* config/loongarch/lasx.md (xorsign<mode>3): New expander.
* config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow
conversion between LSX vector mode and scalar fp mode.
* config/loongarch/loongarch.md (@xorsign<mode>3): New expander.
* config/loongarch/lsx.md (@xorsign<mode>3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test.
* gcc.target/loongarch/xorsign-run.c: New test.
* gcc.target/loongarch/xorsign.c: New test.

lower-bitint: Avoid merging non-mergeable stmt with cast and mergeable stmt [PR112902]

Before bitint lowering, the IL has:
  b.0_1 = b;
  _2 = -b.0_1;
  _3 = (unsigned _BitInt(512)) _2;
  a.1_4 = a;
  a.2_5 = (unsigned _BitInt(512)) a.1_4;
  _6 = _3 * a.2_5;
on the first function.  Now, gimple_lower_bitint has an optimization
(when not -O0) that it avoids assigning underlying VAR_DECLs for certain
SSA_NAMEs where it is possible to lower it in a single loop (or straight
line code) rather than in multiple loops.
So, e.g. the multiplication above uses handle_operand_addr, which can deal
with INTEGER_CST arguments, loads but also casts, so it is fine
not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as
the multiplication can handle it fine.
The more problematic case is the other multiplication operand.
It is again a result of a (in this case narrowing) cast, so it is fine
not to assign VAR_DECL for _3.  Normally we can merge the load (b.0_1)
with the negation (_2) and even with the following cast (_3).  If _3
was used in a mergeable operation like addition, subtraction, negation,
&|^ or equality comparison, all of b.0_1, _2 and _3 could be without
underlying VAR_DECLs.
The problem is that the current code does that even when the cast is used
by a non-mergeable operation, and handle_operand_addr certainly can't handle
the mergeable operations feeding the rhs1 of the cast, for multiplication
we don't emit any loop in which it could appear, for other operations like
shifts or non-equality comparisons we emit loops, but either in the reverse
direction or with unpredictable indexes (for shifts).
So, in order to lower the above correctly, we need to have an underlying
VAR_DECL for either _2 or _3; if we choose _2, then the load and negation
would be done in one loop and extension handled as part of the
multiplication, if we choose _3, then the load, negation and cast are done
in one loop and the multiplication just uses the underlying VAR_DECL
computed by that.
It is far easier to do this for _3, which is what the following patch
implements.
It actually already had code for most of it, just it did that for widening
casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME
defined in some other bb, or with more than one use, etc.).
This falls through into such code even for the narrowing or same precision
casts, unless the cast is used in a mergeable operation.

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112902
* gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing
or same precision cast don't set SSA_NAME_VERSION in m_names only
if use_stmt is mergeable_op or fall through into the check that
use is a store or rhs1 is not mergeable or other reasons prevent
merging.

* gcc.dg/bitint-52.c: New test.

vr-values: Avoid ICEs on large _BitInt cast to floating point [PR112901]

For casts from integers to floating point,
simplify_float_conversion_using_ranges uses SCALAR_INT_TYPE_MODE
and queries optabs on the optimization it wants to make.

That doesn't really work for large/huge BITINT_TYPE, those have BLKmode
which is not scalar int mode. Querying an optab is not useful for that
either.

I think it is best to just skip this optimization for those bitints,
after all, bitint lowering uses ranges already to determine minimum
precision for bitint operands of the integer to float casts.

2023-12-08 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/112901
* vr-values.cc
(simplify_using_ranges::simplify_float_conversion_using_ranges):
Return false if rhs1 has BITINT_TYPE type with BLKmode TYPE_MODE.

* gcc.dg/bitint-51.c: New test.

haifa-sched: Avoid overflows in extend_h_i_d [PR112411]

On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote:
> Without the dg-skip-if I got on 64-bit host with
> -O3 --param min-nondebug-insn-uid=0x40000000:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes

I've looked at this and the problem is in haifa-sched.cc:
9047        h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000
and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h
then treats the value as unsigned, it attempts to allocate
0xe0000374U * 152UL bytes, i.e. those 532GB.  If the above is fixed to do
3U * get_max_uid () / 2 instead, it will get slightly better and will only
need 0x60000373U * 152UL bytes, i.e. 228GB.

Perhaps more could be helped by making the vector indirect (contain pointers
to haifa_insn_data_def rather than the structures themselves) and pool allocate
those, but the more important question is how sparse are uids in normal
compilations without those large --param min-nondebug-insn-uid= parameters.
Because if they aren't enough, such a change would increase compile time memory
just to help the unusual case.

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112411
* haifa-sched.cc (extend_h_i_d): Use 3U instead of 3 in
3 * get_max_uid () / 2 calculation.

LoongArch: Remove the definition of ISA_BASE_LA64V110 from the code.

The instructions defined in LoongArch Reference Manual v1.1 are not the instruction
set v1.1 version. The CPU defined later may only support some instructions in
LoongArch Reference Manual v1.1. Therefore, the macro ISA_BASE_LA64V110 and
related definitions are removed here.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Delete STR_ISA_BASE_LA64V110.
* config/loongarch/genopts/loongarch.opt.in: Likewise.
* config/loongarch/loongarch-cpu.cc (ISA_BASE_LA64V110_FEATURES): Delete macro.
(fill_native_cpu_config): Define a new variable hw_isa_evolution record the
extended instruction set support read from cpucfg.
* config/loongarch/loongarch-def.cc: Set evolution at initialization.
* config/loongarch/loongarch-def.h (ISA_BASE_LA64V100): Delete.
(ISA_BASE_LA64V110): Likewise.
(N_ISA_BASE_TYPES): Likewise.
(defined): Likewise.
* config/loongarch/loongarch-opts.cc: Likewise.
* config/loongarch/loongarch-opts.h (TARGET_64BIT): Likewise.
(ISA_BASE_IS_LA64V110): Likewise.
* config/loongarch/loongarch-str.h (STR_ISA_BASE_LA64V110): Likewise.
* config/loongarch/loongarch.opt: Regenerate.

LoongArch: Switch loongarch-def from C to C++ to make it possible.

We'll use HOST_WIDE_INT in LoongArch static properties in following patches.

To keep the same readability as C99 designated initializers, create a
std::array like data structure with position setter function, and add
field setter functions for structs used in loongarch-def.cc.

Remove unneeded guards #if
!defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
in loongarch-def.h and loongarch-opts.h.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h: Remove extern "C".
(loongarch_isa_base_strings): Declare as loongarch_def_array
instead of plain array.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(loongarch_cpu_strings): Likewise.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_isa): Add a constructor and field setter functions.
* config/loongarch/loongarch-opts.h (loongarch-defs.h): Do not
include for target libraries.
* config/loongarch/loongarch-opts.cc: Comment code that doesn't
run and causes compilation errors.
* config/loongarch/loongarch-tune.h (LOONGARCH_TUNE_H): Likewise.
(struct loongarch_rtx_cost_data): Likewise.
(struct loongarch_cache): Likewise.
(struct loongarch_align): Likewise.
* config/loongarch/t-loongarch: Compile loongarch-def.cc with the
C++ compiler.
* config/loongarch/loongarch-def-array.h: New file for a
std:array like data structure with position setter function.
* config/loongarch/loongarch-def.c: Rename to ...
* config/loongarch/loongarch-def.cc: ... here.
(loongarch_cpu_strings): Define as loongarch_def_array instead
of plain array.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_isa_base_strings): Likewise.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(abi_minimal_isa): Likewise.
(loongarch_rtx_cost_optimize_size): Use field setter functions
instead of designated initializers.
(loongarch_rtx_cost_data): Implement default constructor.

Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]

As documented, --param min-nondebug-insn-uid= is very useful in debugging
-fcompare-debug issues in RTL dumps, without it it is really hard to
find differences.  With it, DEBUG_INSNs generally use low INSN_UIDs
(1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
For good results, the parameter should be larger than the number of
DEBUG_INSNs in all or at least problematic functions, so I typically
use --param min-nondebug-insn-uid=10000 or --param
min-nondebug-insn-uid=1000.

The PR is about using --param min-nondebug-insn-uid=2147483647 or
similar behavior can be achieved with that minus some epsilon,
INSN_UIDs for the non-debug insns then wrap around and as they are signed,
all kinds of things break.  Obviously, that can happen even without that
option, but functions containing more than 2147483647 insns usually don't
compile much earlier due to getting out of memory.
As it is a debugging option, I'd prefer not to impose any drastically small
limits on it because if a function has a lot of DEBUG_INSNs, it is useful
to start still above them, otherwise the allocation of uids will DTRT
even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
allocations.

So, the following patch uses 0x40000000 limit, half the maximum amount for
DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
unlikely overflows in real world.

Note, using large min-nondebug-insn-uid is very expensive for compile time
memory and compile time, because DF as well as various RTL passes use
arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
ditto df (df->insns).

Now, in LRA I've ran into ICEs already with
--param min-nondebug-insn-uid=0x2aaaaaaa
on 64-bit host.  It uses a custom vector management and wants to grow
allocation 1.5x when growing, but all this computation is done in int,
so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value.  And
unlike vec.cc growing which also uses unsigned int type for the above
(and the + 1 is not there), it also doesn't make sure if there is an
overflow that it allocates at least as much as needed, vec.cc
does
  if ...
  else
    /* Grow slower when large.  */
    alloc = (alloc * 3 / 2);

  /* If this is still too small, set it to the right size. */
  if (alloc < desired)
    alloc = desired;
so even if there is overflow during the * 1.5 computation, but
desired is still representable in the range of the alloced counter
(31-bits in both vec.h and LRA), it doesn't grow exponentially but
at least works for the current value.

The patch now uses there
  lra_insn_recog_data_len = index * 3U / 2;
  if (lra_insn_recog_data_len <= index)
    lra_insn_recog_data_len = index + 1;
basically do what vec.cc does.  I thought we could do better for
both vec.cc and LRA on 64-bit hosts even without growing the allocated
counters, but now that I look at it again, perhaps we can't.
The above overflows already with original alloc or lra_insn_recog_data_len
0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
1.  I thought that we could use alloc * (size_t) 3 / 2 so that on 64-bit
hosts it wouldn't overflow that quickly, but 0x55555556 * (size_t) 3 / 2
there is 0x80000001 which is still ok in unsigned, but given that vec.h
then stores the counter into unsigned m_alloc:31; bit-field, it is too much.

With the lra.cc change, one can actually compile simple function
with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
(i.e. the new limit), but already needed quite a big part of my 32GB
RAM + 24GB swap.
The patch adds a dg-skip-if for that case though, because such option
is way too much for 32-bit hosts even at -O0 and empty function,
and with -O3 on a longer function it is too much for average 64-bit host
as well.  Without the dg-skip-if I got on 64-bit host:
cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
and
cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
on 32-bit host.  A test requiring more than 532GB of RAM on 64-bit hosts
is just too much for our testsuite.

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112411
* params.opt (-param=min-nondebug-insn-uid=): Add
IntegerRange(0, 1073741824).
* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
in * 3 / 2 computation and if the result is smaller or equal to
index, use index + 1.

* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
--param min-nondebug-insn-uid=1073741824.