git.ipfire.org Git - thirdparty/gcc.git/log

]> git.ipfire.org Git - thirdparty/gcc.git/log

Pengxuan Zheng [Wed, 21 May 2025 00:58:23 +0000 (17:58 -0700)]

aarch64: Carry over zeroness in aarch64_evpc_reencode

There was a bug in aarch64_evpc_reencode which could leave zero_op0_p and
zero_op1_p of the struct "newd" uninitialized. r16-701-gd77c3bc1c35e303 fixed
the issue by zero initializing "newd." This patch provides an alternative fix
as suggested by Richard Sandiford based on the fact that the zeroness is
preserved by aarch64_evpc_reencode.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy zero_op0_p and
zero_op1_p from d to newd.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>

commit | commitdiff | tree

Stephanos Ioannidis [Wed, 21 May 2025 23:28:36 +0000 (17:28 -0600)]

[PATCH] configure: Always add pre-installed header directories to search path

configure script was adding the target directory flags, including the
'-B' flags for the executable prefix and the '-isystem' flags for the
pre-installed header directories, to the target flags only for
non-Canadian builds under the premise that the host binaries under the
executable prefix will not be able to execute on the build system for
Canadian builds.

While that is true for the '-B' flags specifying the executable prefix,
the '-isystem' flags specifying the pre-installed header directories are
not affected by this and do not need special handling.

This patch updates the configure script to always add the 'include' and
'sys-include' pre-installed header directories to the target search
path, in order to ensure that the availability of the pre-installed
header directories in the search path is consistent across non-Canadian
and Canadian builds.

When '--with-headers' flag is specified, this effectively ensures that
the libc headers, that are copied from the specified header directory to
the sys-include directory, are used by libstdc++.

* configure.ac: Always add pre-installed heades to search path.
* configure: Regenerate.

commit | commitdiff | tree

Andrew Pinski [Mon, 5 May 2025 16:46:14 +0000 (09:46 -0700)]

combine: gen_lowpart_no_emit vs CLOBBER [PR120090]

The problem here is simplify-rtx.cc expects gen_lowpart_no_emit
to return NULL on failure but combine's hook was returning CLOBBER.
After r16-160-ge6f89d78c1a7528e93458278, gcc.target/i386/avx512bw-pr103750-2.c
started to fail at -m32 due to this as new simplify code would return
a RTL with a clobber in it rather than returning NULL.
To fix this gen_lowpart_no_emit should return NULL when there was an failure
instead of a clobber. This only changes the gen_lowpart_no_emit hook and not the
generic gen_lowpart hook as parts of combine just pass gen_lowpart result directly
without checking the return value.

Bootstrapped and tested on x86_64-linux-gnu.

PR rtl-optimization/120090
gcc/ChangeLog:

* combine.cc (gen_lowpart_for_combine_no_emit): New function.
(RTL_HOOKS_GEN_LOWPART_NO_EMIT): Set to gen_lowpart_for_combine_no_emit.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Jeff Law [Wed, 21 May 2025 22:04:58 +0000 (16:04 -0600)]

[RISC-V] Improve (x << C1) + C2 split code

I wrote this a couple months ago to fix an instruction count regression in
505.mcf on risc-v, but I don't have a trivial little testcase to add to the
suite.

There were two problems with the pattern.

First, the code was generating a shift followed by an add after reload.
Naturally combine doesn't run after reload and the code stayed in that form
rather than using shadd when available.

Second the splitter was just over-active.  We need to make sure that the
shifted form of the constant operand has a cost > 1 to synthesize.  It's
useless to split if the shifted constant can be synthesized in a single
instruction.

This has been in my tester since March.  So it's been through numerous
riscv64-elf and riscv32-elf test cycles as well as multiple rv64 bootstrap
tests.  Waiting on the upstream CI system to render a verdict before moving
forward.

Looking further out I'm hoping this pattern will transform into a simpler and
always active define_split.

gcc/
* config/riscv/riscv.md ((x << C1) + C2): Tighten split condition
and generate more efficient code when splitting.

commit | commitdiff | tree

Jeff Law [Wed, 21 May 2025 20:15:23 +0000 (14:15 -0600)]

[RISC-V][PR target/120368] Fix 32bit shift on rv64

So a followup to last week's bugfix.  In last week's change we we stopped using
define_insn_and_split to rewrite instructions.  That change was done to avoid
dropping a masking instruction out of the RTL.

As a result the pattern(s) were changed into simple define_insns, which is
good.  One of them uses the GPR iterator since it's supposed to work for both
32bit and 64bit shifts on rv64.

But we failed to emit the right opcode for a 32bit shift on rv64. Thankfully
the fix is trivial.  If the mode is anything but word_mode, then we must be
doing a 32-bit shift on rv64, ie the various "w" shift instructions.

It's run through my tester.  Just waiting on the upstream CI system to spin it.

PR target/120368
gcc/
* config/riscv/riscv.md (shift with masked shift count): Fix
opcode when generating an SImode shift on rv64.

gcc/testsuite/
* gcc.target/riscv/pr120368.c: New test.

commit | commitdiff | tree

Pan Li [Tue, 20 May 2025 14:30:04 +0000 (22:30 +0800)]

RISC-V: Add test for vec_duplicate + vand.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vand.vv combine to vand.vx,
with the GR2VR cost is 0, 1 and 2.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vand.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Tue, 20 May 2025 07:06:34 +0000 (15:06 +0800)]

RISC-V: Add test for vec_duplicate + vand.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check test for vec_duplicate + vand.vv combine to vand.vx,
with the GR2VR cost is 0, 2 and 15.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add test cases
for vand vx combine case 0 on GR2VR cost.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vand.vx run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vand-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Tue, 20 May 2025 07:00:15 +0000 (15:00 +0800)]

RISC-V: RISC-V: Combine vec_duplicate + vand.vv to vand.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vand.vv to the
vand.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)                                        \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = in[i] OP x;                                            \
  }

  DEF_VX_BINARY(int32_t, &)

Before this patch:
  10   │ test_vx_binary_and_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vand.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_and_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vand.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case for rtx code AND.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op and to no_shift_vx_ops.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:21:08 +0000 (06:21 -0300)]

[testsuite] [x86] vect-simd-clone-1[678]e.c adjust

Since r13-6296, we haven't got 4 simdclone calls for these tests on
ia32 without avx_runtime. With avx_runtime, we get 3 such calls even
on ia32, but we didn't test for anything on ia32 with avx_runtime.
Adjust and simplify the expectations and comments.

for gcc/testsuite/ChangeLog

* gcc.dg/vect/vect-simd-clone-16e.c: Expect fewer calls on ia32.
* gcc.dg/vect/vect-simd-clone-17e.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18e.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:21:04 +0000 (06:21 -0300)]

[testsuite] [x86] pr31985.c needs -fomit-frame-pointer to match movl count

On an --enable-frame-pointer toolchain, pr31985.c gets an extra movl
and fails. Enable -fomit-frame-pointer explicitly.

for gcc/testsuite/ChangeLog

* gcc.target/i386/pr31985.c: Add -fomit-frame-pointer.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:59 +0000 (06:20 -0300)]

[testsuite] [x86] pr108938-3.c needs -msse2 for bswap in foo2 with -m32

Without SSE2, we don't combine the separate loads in foo2 and get
separate rotates, instead of a bswap.

for gcc/testsuite/ChangeLog

* gcc.target/i386/pr108938-3.c: Add -msse2.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:54 +0000 (06:20 -0300)]

[testsuite] [x86] no-callee-saved-16.c needs -fomit-frame-pointer

If the toolchain is built with --enable-frame-pointer,
gcc.target/i386/no-callee-saved-16.c will not get the expected
optimization without -fomit-frame-pointer, that would be enabled by
-O2 without the configure flag. Add it.

for gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-16.c: Add -fomit-frame-pointer.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:48 +0000 (06:20 -0300)]

[testsuite] add missing require vect_early_break_hw for vect-tsvc

Some tsvc tests add vect_early_break options without requiring the
feature to be available. Add the requirements.

for gcc/testsuite/ChangeLog

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Require vect_early_break_hw.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:42 +0000 (06:20 -0300)]

[testsuite] [x86] forwprop-41 needs -msse

The vector operations are only turned into BIT_INSERT_EXPR with -msse
on ia32.

for gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/forwprop-41.c: Add -msse on x86.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:37 +0000 (06:20 -0300)]

[testsuite] [x86] strlenopt-80 needs -msse2 on ia32

The string length optimizations at 8-byte blocks requires -msse2;
-msse is not enough. Bump it.

for gcc/testsuite/ChangeLog

* gcc.dg/strlenopt-80.c: Bump to -msse2.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:33 +0000 (06:20 -0300)]

[testsuite] [x86] memcpy-6 needs -msse2

The 8-byte memory operations will only be inlined on ia32 with
-msse2. Bump it.

for gcc/testsuite/ChangeLog

* gcc.dg/memcpy-6.c: Bump to -msse2.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:29 +0000 (06:20 -0300)]

[testsuite] [x86] double copysign requires -msse2

SSE_FLOAT_MODE_P only holds for DFmode with SSE2, and that's a
condition for copysign<mode>3 to be available under TARGET_SSE_MATH.

Various copysign testcases use -msse -mfpmath=sse on ia32 to enable
the copysign builtins and patterns, but that would only be enough if
the tests were limited to floats. Since they test doubles as well, we
need -msse2 instead of -msse.

for gcc/testsuite/ChangeLog

* gcc.dg/fold-copysign-1.c: Bump to sse2 on ia32.
* gcc.dg/pr55152-2.c: Likewise.
* gcc.dg/tree-ssa/abs-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:22 +0000 (06:20 -0300)]

[testsuite] [aarch64] match alt cache clear names in sme nonlocal_goto tests

vxworks calls cacheTextUpdate instead of __clear_cache.

Adjust the sme/nonlocal_goto_*.c tests for inexact matches.

for gcc/testsuite/ChangeLog

* gcc.target/aarch64/sme/nonlocal_goto_1.c: Match
vxworks cache-clearing function as well.
* gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:17 +0000 (06:20 -0300)]

[testsuite] [aarch64] use uint64_t in rwsr tests

stdint.h defines uint64_t instead of __uint64_t, so use the former.
__uint64_t is not available on e.g. vxworks.

for gcc/testsuite/ChangeLog

* gcc.target/aarch64/acle/rwsr.c: Use uint64_t.
* gcc.target/aarch64/acle/rwsr-2.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:11 +0000 (06:20 -0300)]

[testsuite] tolerate missing std::stold

basic_string.h doesn't define the non-w string version of std::stold
when certain conditions aren't met, and then a couple of tests fail to
compile.

Guard the portions of the tests that depend on std::stold with the
conditions for it to be defined.

for libstdc++-v3/ChangeLog

* testsuite/21_strings/basic_string/numeric_conversions/char/stold.cc:
Guard non-wide stold calls with conditions for it to be
defined.
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
Likewise.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:20:03 +0000 (06:20 -0300)]

[testsuite] [analyzer] [vxworks] define __STDC_WANT_LIB_EXT1__ to 1

vxworks' headers use #if instead of #ifdef to test for
__STDC_WANT_LIB_EXT1__, so the definition in the analyzer test
strotok-cppreference.c catches a bug there, but not something it's
meant to catch or that we could fix in GCC, so amend the definition to
sidestep the libc bug.

for gcc/testsuite/ChangeLog

* c-c++-common/analyzer/strtok-cppreference.c
(__STDC_WANT_LIB_EXT1__): Define to 1.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:19:57 +0000 (06:19 -0300)]

[testsuite] [vxworks] netinet includes atomic, reqs c++11

On vxworks, the included netinet/in.h header indirectly includes
<atomic>, that fails on C++ <11. Skip the test.

for gcc/testsuite/ChangeLog

* c-c++-common/analyzer/fd-glibc-byte-stream-socket.c: Skip on
vxworks with C++ < 11.

commit | commitdiff | tree

Alexandre Oliva [Wed, 21 May 2025 09:19:46 +0000 (06:19 -0300)]

vxworks: libgcc: include string.h for memset

gthr-vxworks-thread.c calls memset in __ghtread_cond_signal, but it
fails ot include <string.h>, where this function is declared, and GCC
14 rejects calls of undeclared functions. Include the required
header.

for libgcc/ChangeLog

* config/gthr-vxworks-thread.c: Include string.h for memset.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:32 +0000 (10:01 +0100)]

genemit: Use a byte encoding to generate insns

genemit has traditionally used open-coded gen_rtx_FOO sequences
to build up the instruction pattern.  This is now the source of
quite a bit of bloat in the binary, and also a source of slow
compile times.

Two obvious ways of trying to deal with this are:

(1) Try to identify rtxes that have a similar form and use shared
    routines to generate rtxes of that form.

(2) Use a static table to encode the rtx and call a common routine
    to expand it.

I did briefly look at (1).  However, it's more complex than (2),
and I think suffers from being the worst of both worlds, for reasons
that I'll explain below.  This patch therefore does (2).

In theory, one of the advantages of open-coding the calls to
gen_rtx_FOO is that the rtx can be populated using stores of known
constants (for the rtx code, mode, unspec number, etc).  However,
the time spent constructing an rtx is likely to be dominated by
the call to rtx_alloc, rather than by the stores to the fields.

Option (1) above loses this advantage of storing constants.
The shared routines would parameterise an rtx according to things
like the modes on the rtx and its suboperands, so the code would
need to fetch the parameters.  In a sense, the rtx structure would
be open-coded but the parameters would be table-encoded (albeit
in a simple way).

The expansion code also shouldn't be particularly hot.  Anything that
treats expand/discard cycles as very cheap would be misconceived,
since each discarded expansion generates garbage memory that needs
to be cleaned up later.

Option (2) turns out to be pretty simple -- certainly simpler
than (1) -- and seems to give a reasonable saving.  Some numbers,
all for --enable-checking=yes,rtl,extra:

[A] size of the @progbits sections in insn-emit-*.o, new / old
[B] size of the load segments in cc1, new / old
[C] time to compile a typical insn-emit*.cc, new / old

Target                 [A]      [B]      [C]
--------------------------------------------
native aarch64      0.5627   0.9585   0.5677
native x86_64       0.5925   0.9467   0.6377
aarch64-x-riscv64   0.5555   0.9066   0.2762

To get an idea of the effect on the final compiler, I tried compiling
fold-const.ii with -O0 (no -g), since that should give any slowdown
less room to hide.  I couldn't measure any difference in compile time
before or after the patch for any of the three variants above.

gcc/
* gensupport.h (needs_barrier_p): Delete.
* gensupport.cc (needs_barrier_p): Likewise.
* rtl.h (always_void_p): Return true for PC, RETURN and SIMPLE_RETURN.
(expand_opcode): New enum class.
(expand_rtx, complete_seq): Declare.
* emit-rtl.cc (rtx_expander): New class.
(expand_rtx, complete_seq): New functions.
* gengenrtl.cc (special_rtx, excluded_rtx): Add a cross-reference
comment.
* genemit.cc (FIRST_CODE): New constant.
(print_code): Delete.
(generator::file, generator::used, generator::sequence_type): Delete.
(generator::bytes): New member variable.
(generator::generator): Update accordingly.
(generator::gen_rtx_scratch): Delete.
(generator::add_uint, generator::add_opcode, generator::add_code)
(generator::add_match_operator, generator::add_exp)
(generator::add_vec, generator::gen_table): New member functions.
(generator::gen_exp): Rewrite to use a bytecode expansion.
(generator::gen_emit_seq): Likewise.
(start_gen_insn): Return the C++ expression for the operands array.
(gen_insn, gen_expand, gen_split): Update callers accordingly.
(emit_c_code): Remove use of _val.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:31 +0000 (10:01 +0100)]

genemit: Avoid using gen_exp in output_add_clobbers

output_add_clobbers emits code to add:

  (clobber (scratch:M))

and/or:

  (clobber (reg:M R))

expressions to the end of a PARALLEL.  At the moment, it does this
using the general gen_exp function.  That makes sense with the code
in its current form, but with later patches it's more convenient to
handle the two cases directly.

This also avoids having to pass an md_rtx_info that is unrelated
to the clobber expressions.

gcc/
* genemit.cc (clobber_pat::code): Delete.
(maybe_queue_insn): Don't set clobber_pat::code.
(output_add_clobbers): Remove info argument and output the two
REG and SCRATCH cases directly.
(main): Update call accordingly.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:31 +0000 (10:01 +0100)]

genemit: Remove support for string operands

gen_exp currently supports the 's' (string) operand type.  It would
certainly be possible to make the upcoming bytecode patch support
that too.  However, the rtx codes that have string operands should
be very rarely used in hard-coded define_insn/expand/split/peephole2
rtx templates (as opposed to things like attribute expressions,
where const_string is commonplace).  And AFAICT, no current target
does use them like that.

This patch therefore reports an error for these rtx codes,
rather than adding code that would be unused and untested.

gcc/
* genemit.cc (generator::gen_exp): Report an error for 's' operands.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:30 +0000 (10:01 +0100)]

genemit: Remove purported handling of location_ts

gen_exp had code to handle the 'L' operand format. But this format
is specifically for location_ts, which are only used in RTX_INSNs.
Those should never occur in this context, where the input is always
an md file rather than an __RTL function. Any hard-coded raw
location value would be meaningless anyway.

It seemed safer to turn this into an error rather than a gcc_unreachable.

gcc/
* genemit.cc (generator::gen_exp): Raise an error if we see
an 'L' operand.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:30 +0000 (10:01 +0100)]

genemit: Always track multiple uses of operands

gen_exp has code to detect when the same operand is used multiple
times.  It ensures that second and subsequent uses call copy_rtx,
to enforce correct unsharing.

However, for historical reasons that aren't clear to me, this was
skipped for a define_insn unless the define_insn was a parallel.
It was also skipped for a single define_expand instruction,
regardless of its contents.

This meant that a single parallel instruction was treated differently
between define_insn (where sharing rules were followed) and
define_expand (where sharing rules weren't followed).  define_splits
and define_peephole2s followed the sharing rules in all cases.

This patch makes everything follow the sharing rules.  The code
it touches will be removed by the proposed bytecode-based expansion,
which will use its own tracking when enforcing sharing rules.
However, it seemed better for staging and bisection purposes
to make this change first.

gcc/
* genemit.cc (generator::used): Update comment.
(generator::gen_exp): Remove handling of null unused arrays.
(gen_insn, gen_expand): Always pass a used array.
(output_add_clobbers): Note why the used array is null here.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:29 +0000 (10:01 +0100)]

genemit: Add a generator struct

gen_exp now has quite a few arguments that need to be passed
to each recursive call. This patch turns it and related routines
into member functions of a new generator class, so that the shared
information can be stored in member variables.

This also helps to make later patches less noisy.

gcc/
* genemit.cc (generator): New structure.
(gen_rtx_scratch, gen_exp, gen_emit_seq): Turn into member
functions of generator.
(gen_insn, gen_expand, gen_split, output_add_clobbers): Update
users accordingly.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:29 +0000 (10:01 +0100)]

genemit: Consistently use operand arrays in gen_* functions

One slightly awkward part about emitting the generator function
bodies is that:

* define_insn and define_expand routines have a separate argument for
  each operand, named "operand0" upwards.

* define_split and define_peephole2 routines take a pointer to an array,
  named "operands".

* the C++ preparation code for expands, splits and peephole2s uses an
  array called "operands" to refer to the operands.

* the automatically-generated code uses individual "operand<N>"
  variables to refer to the operands.

So define_expands have to store the incoming arguments into an operands
array before the md file's C++ code, then copy the operands array back
to the individual variables before the automatically-generated code.
splits and peephole2s have to copy the incoming operands array to
individual variables after the md file's C++ code, creating more
local variables that are live across calls to rtx_alloc.

This patch tries to simplify things by making the whole function
body use the operands array in preference to individual variables.
define_insns and define_expands store their arguments to the array
on entry.

This would have pros and cons on its own, but having a single array
helps with future efforts to reduce the duplication between gen_*
functions.

gcc/
* genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
operand%d.
(start_gen_insn): Mark the incoming arguments as const and store
them to an operands array.
(gen_expand, gen_split): Remove copies into and out of the operands
array.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:28 +0000 (10:01 +0100)]

genemit: Factor out code common to insns and expands

Mostly to reduce cut-&-paste.

gcc/
* genemit.cc (start_gen_insn): New function, split out from...
(gen_insn, gen_expand): ...here.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:28 +0000 (10:01 +0100)]

genemit: Add an internal queue

An earlier version of this series wanted to collect information
about all the gen_* functions that are going to be generated.
The current version no longer does that, but the queue seemed
worth keeping anyway, since it gives a more consistent structure.

gcc/
* genemit.cc (queue): New static variable.
(maybe_queue_insn): New function, split out from...
(gen_insn): ...here.
(queue_expand): New function, split out from...
(gen_expand): ...here.
(gen_split): New function, split out from...
(queue_split): ...here.
(main): Queue definitions for later processing rather than
emitting them on the fly.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:27 +0000 (10:01 +0100)]

genemit: Use references rather than pointers

This patch makes genemit.cc pass the md_rtx_info around by constant
reference rather than pointer. It's somewhat of a cosmetic change
on its own, but it makes later changes less noisy.

gcc/
* genemit.cc (gen_exp): Make the info argument a constant reference.
(gen_emit_seq, gen_insn, gen_expand, gen_split): Likewise.
(output_add_clobbers): Likewise.
(main): Update calls accordingly.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:27 +0000 (10:01 +0100)]

sparc: Avoid operandN variables in .md files

The automatically-generated gen_* routines take their operands as
individual arguments, named "operand0" upwards.  These arguments are
stored into an "operands" array before invoking the expander's C++
code, which can then modify the operands by writing to the array.

However, the SPARC sign-extend and zero-extend expanders used the
operandN variables directly, rather than operands[N].  That's a
correct usage in context, since the code goes on to expand the
pattern manually and invoke DONE.

But it's also easy for code to accidentally write to operandN instead
of operands[N] when trying to set up something like a match_dup.
It sounds like Jeff had seen an instance of this.

A later patch is therefore going to mark the operandN arguments
as const.  This patch makes way for that by using operands[N]
instead of operandN for the SPARC expanders.

gcc/
* config/sparc/sparc.md (zero_extendhisi2, zero_extendhidi2)
(extendhisi2, extendqihi2, extendqisi2, extendqidi2)
(extendhidi2): Use operands[0] and operands[1] instead of
operand0 and operand1.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:26 +0000 (10:01 +0100)]

xstormy16: Avoid accessing beyond the operands[] array

The negsi2 C++ code writes to operands[2] even though the pattern
has no operand 2.

gcc/
* config/stormy16/stormy16.md (negsi2): Remove unused assignment.

commit | commitdiff | tree

Richard Sandiford [Wed, 21 May 2025 09:01:26 +0000 (10:01 +0100)]

nds32: Avoid accessing beyond the operands[] array

This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:

array subscript 2 is above array bounds of ‘rtx_def* [2]’

gcc/
PR target/100837
* config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
a local variable instead of operands[2].

commit | commitdiff | tree

Iain Sandoe [Mon, 12 May 2025 19:38:48 +0000 (20:38 +0100)]

c++, coroutines: Clean up the ramp cleanups.

This replaces the cleanup try-catch block in the ramp with a series of
eh-only cleanup statements.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Replace ramp
cleanup try-catch block with eh-only cleanup statements.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

commit | commitdiff | tree

Iain Sandoe [Sun, 11 May 2025 19:36:58 +0000 (20:36 +0100)]

c++, coroutines: Use decltype(auto) for the g_r_o.

The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references.

It is quite reasonable for a coroutine body implementation to
complete before control is returned to the ramp - and in that
case we would be creating the ramp return object from an already-
deleted promise object.

Jason observes that this is a terrible situation and we should
seek a resolution to it via core.

Since the test added here explicitly performs the unsafe action
dscribed above we expect it to fail (until a resolution is found).

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction. Run the test and XFAIL it.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

commit | commitdiff | tree

Iain Sandoe [Mon, 12 May 2025 18:47:42 +0000 (19:47 +0100)]

c++, coroutines: Address CWG2563 return value init [PR119916].

This addresses the clarification that, when the get_return_object is of a
different type from the ramp return, any necessary conversions should be
performed on the return expression (so that they typically occur after the
function body has started execution).

PR c++/119916

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Do not
initialise initial_await_resume_called here...
(cp_coroutine_transform::build_ramp_function): ... but here.
When the coroutine is not void, initialize a GRO object from
promise.get_return_object(). Use this as the argument to the
return expression. Use a regular cleanup for the GRO, since
it is ramp-local.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/special-termination-00-sync-completion.C:
Amend for CWG2563 expected behaviour.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C:
Likewise.
* g++.dg/coroutines/torture/pr119916.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

commit | commitdiff | tree

Xℹ Ruoyao [Fri, 10 Jul 2020 13:38:09 +0000 (21:38 +0800)]

libstdc++: use maintained size when split pb_ds binary search trees

libstdc++-v3/ChangeLog:

PR libstdc++/81806
* include/ext/pb_ds/detail/bin_search_tree_/split_join_fn_imps.hpp
(split_finish): Use maintained size, instead of calling
std::distance.

commit | commitdiff | tree

Xℹ Ruoyao [Fri, 10 Jul 2020 12:58:04 +0000 (20:58 +0800)]

libstdc++: maintain subtree size in pb_ds binary search trees

libstdc++-v3/ChangeLog:

* include/ext/pb_ds/detail/rb_tree_map_/node.hpp
(rb_tree_node_::size_type): New typedef.
(rb_tree_node_::m_subtree_size): New field.
* include/ext/pb_ds/detail/splay_tree_/node.hpp
(splay_tree_node_::size_type): New typedef.
(splay_tree_node_::m_subtree_size): New field.
* include/ext/pb_ds/detail/bin_search_tree_/bin_search_tree_.hpp
(PB_DS_BIN_TREE_NAME::update_subtree_size): Declare new member
function.
* include/ext/pb_ds/detail/bin_search_tree_/rotate_fn_imps.hpp
(update_subtree_size): Define.
(apply_update, update_to_top): Call update_subtree_size.

commit | commitdiff | tree

Xℹ Ruoyao [Fri, 10 Jul 2020 12:10:52 +0000 (20:10 +0800)]

libstdc++: remove two redundant statements in pb_ds binary tree

libstdc++-v3/ChangeLog:

* include/ext/pb_ds/detail/bin_search_tree_/insert_fn_imps.hpp
(insert_leaf_new, insert_imp_empty): remove redundant statements.

commit | commitdiff | tree

Andrew Pinski [Tue, 20 May 2025 20:21:28 +0000 (13:21 -0700)]

middle-end: Fix complex lowering of cabs with no LHS [PR120369]

This was introduced by r15-1797-gd8fe4f05ef448e . I had missed that
the LHS of the cabs call could be NULL. This seems to only happen at -O0,
I tried to produce one that happens at -O1 but needed many different
options to prevent the removal of the call.
Anyways the fix is just keep around the call if the LHS is null.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/120369

gcc/ChangeLog:

* tree-complex.cc (gimple_expand_builtin_cabs): Return early
if the LHS of cabs is null.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr120369-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Shreya Munnangi [Wed, 21 May 2025 02:15:42 +0000 (20:15 -0600)]

[RISC-V] Infrastructure of synthesizing logical AND with constant

So this is the next step on the path to mvconst_internal removal and is work
from Shreya and myself.

This puts in the infrastructure to allow us to synthesize logical AND much like
we're doing with logical IOR/XOR.

Unlike IOR/XOR, AND has many more special cases that can be profitable. For
example, you can use shifts to clear many bits.  You can use zero extension to
clear bits, you can use rotate+andi+rotate, shift pairs, etc.

So to make potential bisecting easy the plan is to drop in the work on logical
AND in several steps, essentially one new case at a time.

This step just puts the basics of a operation synthesis in place.  It still
uses the same code generation strategies as we are currently using.

I'd like to say this is NFC, but unfortunately that's not true.  While the code
generation strategy is the same, this does indirectly introduce new REG_EQUAL
notes.  Those additional notes in turn can impact how various optimizers behave
in very minor ways.

As usual, this has survived my tester on riscv32-elf and riscv64-elf.

Waiting on pre-commit to do its thing.  And I'll start queuing up the
additional cases we want to handle while waiting 😉

gcc/
* config/riscv/riscv-protos.h (synthesize_and): Prototype.
* config/riscv/riscv.cc (synthesize_and): New function.
* config/riscv/riscv.md (and<mode>3): Use it.

Co-Authored-By: Jeff Law <jlaw@ventanamicro.com>

commit | commitdiff | tree

liuhongt [Wed, 26 Feb 2025 06:48:27 +0000 (22:48 -0800)]

Add pattern match in match.pd for .AVG_CEIL

1) Optimize (a >> 1) + (b >> 1) + ((a | b) & 1) to .AVG_CEIL (a, b)
2) Optimize (a | b) - ((a ^ b) >> 1) to .AVG_CEIL (a, b)

gcc/ChangeLog:

PR middle-end/118994
* match.pd ((a >> 1) + (b >> 1) + ((a | b) & 1) to
.AVG_CEIL (a, b)): New pattern.
((a | b) - ((a ^ b) >> 1) to .AVG_CEIL (a, b)): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr118994-1.c: New test.
* gcc.target/i386/pr118994-2.c: New test.

commit | commitdiff | tree

GCC Administrator [Wed, 21 May 2025 00:17:57 +0000 (00:17 +0000)]

Daily bump.

commit | commitdiff | tree

Andrew Pinski [Sun, 18 May 2025 07:06:38 +0000 (00:06 -0700)]

match: Remove valueize_condition argument from gimple_extra template

After r15-4791-gb60031e8f9f8fe, the valueize_condition argument becomes
unused. I didn't notice that as there was -Wno-unused option being added
while compiling gimple-match-exports.cc. This removes that too as there are
no unused warnings.

gcc/ChangeLog:

* Makefile.in (gimple-match-exports.o-warn): Remove.
* gimple-match-exports.cc (gimple_extract): Remove valueize_condition
argument.
(gimple_extract_op): Update call to gimple_extract.
(gimple_simplify): Likewise. Also remove valueize_condition lambda.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Robert Dubner [Tue, 20 May 2025 17:35:15 +0000 (13:35 -0400)]

cobol: Multiple PRs; formatting; exception processing.

The PRs mentined here have either been previously fixed, or are fixed by
this commit.

gcc/cobol/ChangeLog:

PR cobol/119770
PR cobol/119772
PR cobol/119790
PR cobol/119771
PR cobol/119810
PR cobol/119335
PR cobol/119632
* cdf-copy.cc (GLOB_BRACE): Eliminate <glob.h>.
* cdfval.h (_CDF_VAL_H_): Switch to C++ headers.
* copybook.h (class copybook_elem_t): Eliminate <glob.h>.
(class copybook_t): Likewise.
* gcobc: Numerous changes to improve utility.
* gcobol.1: Correct names in the list of functions.
* genapi.cc (compare_binary_binary): Use has_attr() function.
* lexio.cc (cdftext::lex_open): Typo; filename logic.
(cdftext::process_file): Filename logic.
* parse.y: Numerous parsing changes.
* parse_ante.h (new_alphanumeric): C++ includes; changes to temporaries.
(new_tempnumeric): Likewise.
(new_tempnumeric_float): Likewise.
(set_real_from_capacity): Created.
* scan.l: Use yy_pop_state().
* scan_ante.h (typed_name): Find figconst from data.initial.
* symbols.cc (symbol_valid_udf_args): Eliminate.
(symbols_update): figconst processing.
(new_temporary_impl): For functions, set .initial to function name.
(temporaries_t::acquire): Likewise.
(new_alphanumeric): Likewise.
(new_temporary): Likewise.
* symbols.h (_SYMBOLS_H_): Use C++ includes.
(cbl_figconst_tok): Change handling of figconst.
(cbl_figconst_field_of): Change handling of figconst.
(symbol_valid_udf_args): Eliminate.
* symfind.cc (symbol_match2): Change declaration.
(symbol_match): Change declaration.

libgcobol/ChangeLog:

* charmaps.cc: Switch to C++ includes.
* common-defs.h: Likewise.
* constants.cc: Likewise.
* ec.h: Remove #include <assert.h>.
* gcobolio.h (GCOBOLIO_H_): Switch to C++ includes.
* gfileio.cc: Likewise.
* gmath.cc: Likewise.
* intrinsic.cc: Comment formatting; C++ includes.
* io.cc: C++ includes.
* libgcobol.cc: (__gg__stash_exceptions): Eliminate.
* valconv.cc: Switch to C++ includes.

Co-Authored-By: James K. Lowden <jklowden@cobolworx.com>

commit | commitdiff | tree

Umesh Kalappa [Tue, 20 May 2025 17:57:00 +0000 (11:57 -0600)]

[PATCH v2 2/2] MIPS p8700 doesn't have vector extension and added the dummies reservation for the same.

The RISC-V backend requires all types to map to a reservation in the
scheduler model. This adds types to a dummy reservation for all the
types not currently handled by the p8700 model.

gcc/
* config/riscv/mips-p8700.md (mips_p8700_dummies): New
reservation.
(mips_p8700_unknown): Reservation for all the dummies.

commit | commitdiff | tree

Umesh Kalappa [Tue, 20 May 2025 17:50:46 +0000 (11:50 -0600)]

[PATCH v2 1/2] The following changes enable P8700 processor for RISCV and P8700 is a high-performance processor from MIPS by extending RISCV with custom instructions

Add support for the p8700 design from MIPS.

gcc/
* config/riscv/mips-p8700.md: New scheduler model.
* config/riscv/riscv-cores.def (mips-p87000): New tuning model
and core architecture.
* config/riscv/riscv-opts.h (riscv_microarchitecture_type); Add
mips-p8700.
* config/riscv/riscv.cc (mips_p8700_tune_info): New uarch
tuning parameters.
* config/riscv/riscv.md (tune): Add mips_p8700.
Include mips-p8700.md
* doc/invoke.texi: Document tune/cpu options for the MIPS P8700.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

commit | commitdiff | tree

Robert Dubner [Tue, 20 May 2025 15:49:43 +0000 (11:49 -0400)]

cobol: sqrt(0) is not an ec-argument error. [PR119885]

libgcobol

PR cobol/119885
* intrinsic.cc: (__gg__sqrt): Change test from <= zero to < zero.

gcc/testsuite

* cobol.dg/group2/FUNCTION_SQRT__2_.cob: Testcase.
* cobol.dg/group2/FUNCTION_SQRT__2_.out: Known-good for the testcase.

commit | commitdiff | tree

Tomasz Kamiński [Thu, 24 Apr 2025 14:03:27 +0000 (16:03 +0200)]

libstdc++: Cleanup and stabilize format _Spec<_CharT> and _Pres_type.

These patch makes following changes to _Pres_type values:
* _Pres_esc is replaced with separate _M_debug flag.
* _Pres_s, _Pres_p do not overlap with _Pres_none.
* hexadecimal presentation use same values for pointer, integer
   and floating point types.

The members of _Spec<_CharT> are rearranged so the class contains 8 bits
reserved for future use (_M_reserved) and 8 bits of tail padding.
Derived classes (like _ChronoSpec<_CharT>) can reuse the storage for initial
members. We also add _SpecBase as the base class for _Spec<_CharT> to make
it non-C++98 POD, which allows tail padding to be reused on Itanium ABI.

Finally, the format enumerators are defined as enum class with unsigned
char as underlying type, followed by using enum to bring names in scope.
_Term_char names are adjusted for consistency, and enumerator values are
changed so it can fit in smaller bitfields.

The '?' is changed to separate _M_debug flag, to allow debug format to be
independent from the presentation type, and applied to multiple presentation
types. For example it could be used to trigger memberwise or reflection based
formatting.

The _M_format_character and _M_format_character_escaped functions are merged
to single function that handle normal and debug presentation. In particular
this would allow future support for '?c' for printing integer types as escaped
character. _S_character_width is also folded in the merged function.

Decoupling _Pres_s value from _Pres_none, allows it to be used for string
presentation for range formatting, and removes the need for separate _Pres_seq
and _Pres_str. This does not affect formatting of bool as __formatter_int::_M_parse
overrides default value of _M_type. And with separation of the _M_debug flag,
__formatter_str::format behavior is now agnostic to _M_type value.

The values for integer presentation types, are arranged so textual presentations
(_Prec_s, _Pres_c) are grouped together. For consistency floating point
hexadecimal presentation uses the same values as integer ones.

New _Pres_p and setting for _M_alt enables using some spec to configure formatting
of  uintptr_t with __formatter_int, and const void* with __formatter_ptr.
Differentiating it from _Pres_none would allow future of formatter<T*, _CharT>
that would require explicit presentation type to be specified. This would allow
std::vector<T*> to be formatted directly with '{::p}' format spec.

The constructors for __formatter_int and _formatter_ptr from _Spec<_CharT>,
now also set default presentation modes, as format functions expects them.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific):
Declare as bit fiekd in tail-padding..
* include/bits/formatfwd.h (__format::_Align): Defined as enum
class and add using enum.
* include/std/format (__format::_Pres_type, __format::_Sign)
(__format::_WidthPrec,  __format::_Arg_t): Defined as enum class
and add using enum.
(_Pres_type::_Pres_esc): Replace with _Pres_max.
(_Pres_type::_Pres_seq, _Pres_type::_Pres_str): Remove.
(__format::_Pres_type): Updated values of enumerators as described
above.
(__format::_Spec): Rearranged members to have 8 bits of tail-padding.
(_Spec::_M_debug): Defined.
(_Spec::_M_reserved): Extended to 8 bits and moved at the end.
(_Spec::_M_reserved2): Removed.
(_Spec::_M_parse_fill_and_align, _Spec::_M_parse_sign)
(__format::__write_padded_as_spec): Adjusted default value checks.
(__format::_Term_char): Add using enum and adjust enumertors.
(__Escapes::_S_term): Adjusted for _Term_char values.
(__format::__should_escape_ascii): Adjusted _Term_char uses.
(__format::__write_escaped): Adjusted for _Term_char.
(__formatter_str::parse): Set _Pres_s if specifed and _M_debug
instead of _Pres_esc.
(__formatter_str::set_debug_format): Set _M_debug instead of
_Pres_esc.
(__formatter_str::format, __formatter_str::_M_format_range):
Check _M_debug instead of _Prec_esc.
(__formatter_str::_M_format_escaped): Adjusted _Term_char uses.
(__formatter_int::__formatter_int(_Spec<_CharT>)): Set _Pres_d if
default presentation type is not set.
(__formatter_int::_M_parse): Adjusted default value checks.
(__formatter_int::_M_do_parse): Set _M_debug instead of _Pres_esc.
(__formatter_int::_M_format_character): Handle escaped presentation.
(__formatter_int::_M_format_character_escaped)
(__formatter_int::_S_character_width): Merged into
_M_format_character.
(__formatter_ptr::__formatter_ptr(_Spec<_CharT>)): Set _Pres_p if
default presentation type is not set.
(__formatter_ptr::parse): Add default __type parameter, store _Pres_p,
and handle _M_alt to be consistent with meaning for integers.
(__foramtter_ptr<_CharT>::_M_set_default): Define.
(__format::__pack_arg_types, std::basic_format_args): Add necessary
casts.
(formatter<_CharT, _CharT>::set_debug_format)
(formatter<char, wchar_t>::set_debug_format): Set _M_debug instead of
_Pres_esc.
(formatter<_CharT, _CharT>::format, formatter<char, wchar_t>::format):
Simplify calls to _M_format_character.
(range_formatter<_Rg, _CharT>::parse): Replace _Pres_str with
_Pres_s and set _M_debug instead of _Pres_esc.
(range_formatter<_Rg, _CharT>::format): Replace _Pres_str with
_Pres_s.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

commit | commitdiff | tree

Jonathan Wakely [Tue, 20 May 2025 09:53:41 +0000 (10:53 +0100)]

libstdc++: Fix incorrect links to archived SGI STL docs

In r8-7777-g25949ee33201f2 I updated some URLs to point to copies of the
SGI STL docs in the Wayback Machine, because the original pags were no
longer hosted on sgi.com. However, I incorrectly assumed that if one
archived page was at https://web.archive.org/web/20171225062613/... then
all the other pages would be too. Apparently that's not how the Wayback
Machine works, and each page is archived on a different date. That meant
that some of our links were redirecting to archived copies of the
announcement that the SGI STL docs have gone away.

This fixes each URL to refer to a correctly archived copy of the
original docs.

libstdc++-v3/ChangeLog:

* doc/xml/faq.xml: Update URL for archived SGI STL docs.
* doc/xml/manual/containers.xml: Likewise.
* doc/xml/manual/extensions.xml: Likewise.
* doc/xml/manual/using.xml: Likewise.
* doc/xml/manual/utilities.xml: Likewise.
* doc/html/*: Regenerate.

commit | commitdiff | tree

Jakub Jelinek [Tue, 20 May 2025 07:36:58 +0000 (09:36 +0200)]

libgcc: Move bitint support exports to x86/aarch64 specific map files

When adding _BitInt support I was hoping all or most of arches would
implement it already for GCC 14.  That didn't happen and with
new hosts adding support for _BitInt for GCC 16 (s390x-linux and as was
posted today loongarch-linux too), we need the _BitInt support functions
exported on those arches at GCC_16.0.0 rather than GCC_14.0.0 which
shouldn't be changed anymore.

The following patch does that.  Both arches were already exporting
some of the _BitInt related symbols in their specific map files, this
just moves the remaining ones there as well.

2025-05-20  Jakub Jelinek  <jakub@redhat.com>

* libgcc-std.ver.in (GCC_14.0.0): Remove bitint related exports
from here.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Add them here.
* config/i386/libgcc-darwin.ver (GCC_14.0.0): Likewise.
* config/i386/libgcc-sol2.ver (GCC_14.0.0): Likewise.
* config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Likewise.

commit | commitdiff | tree

Jakub Jelinek [Tue, 20 May 2025 06:21:14 +0000 (08:21 +0200)]

tree-chrec: Use signed_type_for in convert_affine_scev

On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.

The patch fixes it similarly what has been done for GCC 14 in various
other spots.

2025-05-20 Jakub Jelinek <jakub@redhat.com>

* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.

commit | commitdiff | tree

Jakub Jelinek [Tue, 20 May 2025 06:20:16 +0000 (08:20 +0200)]

libgcc: Small bitint_reduce_prec big-endian fixes

The big-endian _BitInt support in libgcc was written without any
testing and so I haven't discovered I've made one mistake in it
(in multiple places).
The bitint_reduce_prec function attempts to optimize inputs
which have some larger precision but at runtime they are found
to need smaller number of limbs.
For little-endian that is handled just by returning smaller
precision (or negative precision for signed), but for
big-endian we need to adjust the passed in limb pointer so that
when it returns smaller precision the argument still contains
the least significant limbs for the returned precision.

2025-05-20 Jakub Jelinek <jakub@redhat.com>

* libgcc2.c (bitint_reduce_prec): For big endian
__LIBGCC_BITINT_ORDER__ use ++*p and --*p instead of
++p and --p.
* soft-fp/bitint.h (bitint_reduce_prec): Likewise.

commit | commitdiff | tree

Jakub Jelinek [Tue, 20 May 2025 06:18:58 +0000 (08:18 +0200)]

bitintlower: Big-endian lowering support

The following patch adds big endian support to the bitintlower pass.
While the rest of the _BitInt support has been written with endianity
in mind, in the bitintlower pass I've written it solely little endian
at the start, because the pass is large and complicated and there were
no big-endian backends with _BitInt psABI at the point of writing it,
so the big-endian support would be completely untested.
Now that I got privately a patch to add s390x support, I went through
the whole pass and added the support.
Some months ago I've talked about two possibilities to do the big-endian
support, one perhaps easier would be keep the idx vars (INTEGER_CSTs
for bitint_prec_large and partially SSA_NAMEs, partially INTEGER_CSTs
for bitint_prec_huge) iterating like for little-endian from 0 upwards
and do the big-endian index correction only when accessing the limbs
(but mergeable casts between _BitInts with different number of limbs
would be a nightmare), which would have the disadvantage that we'd need
to wait until propagation and ivopts to fix stuff up (and not sure it
would be able to fix everything), or change stuff so that the idxes
used between the different bitint_large_huge class methods iterate on
big endian from highest down to 0.
The following patch implements the latter.
On s390x with the 3 patches from IBM without this patch I got on
make -j32 -k check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c
+pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* vect.exp='vect-early-break_99-pr113287.c'
+tree-ssa.exp=pr113735.c"
347 FAILs, 326 out of that execution failures (and that doesn't include
some tests that happened to succeed by pure luck because e.g. comparisons
weren't implemented correctly).
With this patch (and 2 small patches I'm going to post next) I got this
down to
FAIL: gcc.dg/dfp/bitint-1.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-2.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-3.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-4.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-5.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-6.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-8.c (test for excess errors)
FAIL: gcc.dg/torture/bitint-64.c   -O3 -g  execution test
FAIL: gcc.dg/torture/bitint-64.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
where the dfp stuff is due to missing DPD dfp <-> _BitInt support
I'm working on next, and bitint-64.c is some expansion related
issue with _Atomic _BitInt(5) (will look at it later, but there
bitint lowering isn't involved at all).
Most of the patch just tweaks things so that it iterates in the
right direction, for casts with different number of limbs does the
needed index adjustments and unfortunately due to that (and e.g.
add/sub/mul overflow BE lowering) has some pessimizations on the
SSA conflict side; on little-endian mergeable ops have the
advantage that all the accesses iterate from index 0 up, so even
if there is e.g. overlap between the lhs and some used values, except
for mul/div where libgcc APIs require no overlap we don't need to
avoid it, all the limbs are updated in sync before going on to handle
next limb.  On big-endian, that isn't the case, casts etc. can result
in index adjustments and so we could overwrite a limb that will still
need to be processed as input.  So, there is a special case that looks
for different numbers of limbs among arguments and in that case marks
the lhs as conflicting with the inputs.

On little-endian, this patch shouldn't affect code generation, with
one little exception; in the separate_ext handling in lower_mergeable_stmt
the loop (if bitint_large_huge) was iterating using some idx and then
if bo_idx was non-zero, adding that constant to a new SSA_NAME and using
that to do the limb accesses.  As the limb accesses are the only place
where the idx is used (apart from the loop exit test), I've changed it
to iterate on idxes with bo_idx already added to those.

P.S., would be nice to eventually also enable big-endian aarch64,
but I don't have access to those, so can't test that myself.

P.S., at least in the current s390x patches it wants info->extended_p,
this patch doesn't change anything about that.  I believe most of the
time the _BitInt vars/params/return values are already extended, but
there is no testcase coverage for that, I will work on that incrementally
(and then perhaps arm 32-bit _BitInt support can be enabled too).

2025-05-20  Jakub Jelinek  <jakub@redhat.com>

* gimple-lower-bitint.cc (bitint_big_endian): New variable.
(bitint_precision_kind): Set it.
(struct bitint_large_huge): Add unsigned argument to
finish_arith_overflow.
(bitint_large_huge::limb_access_type): Handle bitint_big_endian.
(bitint_large_huge::handle_operand): Likewise.
(bitint_large_huge::handle_cast): Likewise.
(bitint_large_huge::handle_bit_field_ref): Likewise.
(bitint_large_huge::handle_load): Likewise.
(bitint_large_huge::lower_shift_stmt): Likewise.
(bitint_large_huge::finish_arith_overflow): Likewise.
Add nelts argument.
(bitint_large_huge::lower_addsub_overflow): Handle bitint_big_endian.
Adjust finish_arith_overflow caller.
(bitint_large_huge::lower_mul_overflow): Likewise.
(bitint_large_huge::lower_bit_query): Handle bitint_big_endian.
(bitint_large_huge::lower_stmt): Likewise.
(build_bitint_stmt_ssa_conflicts): Likewise.
(gimple_lower_bitint): Likewise.

* gcc.dg/torture/bitint-78.c: New test.
* gcc.dg/torture/bitint-79.c: New test.
* gcc.dg/torture/bitint-80.c: New test.
* gcc.dg/torture/bitint-81.c: New test.

commit | commitdiff | tree

Nathaniel Shead [Mon, 19 May 2025 12:11:13 +0000 (22:11 +1000)]

c++/modules: Ensure vtables are emitted when needed [PR120349]

I missed a testcase in r16-688-gc875748cdc468e for whether a GM vtable
should be emitted in an importer when it has no non-inline key function.
Before that patch the code worked because always we marked all vtables
as DECL_EXTERNAL, which then meant that reading the definition marked
them as DECL_NOT_REALLY_EXTERN.

This patch restores the old behaviour so that vtables are marked
DECL_EXTERNAL (and hence DECL_NOT_REALLY_EXTERN).

PR c++/120349

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Always mark vtables as
DECL_EXTERNAL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/vtt-3_a.C: New test.
* g++.dg/modules/vtt-3_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

commit | commitdiff | tree

Jeff Law [Tue, 20 May 2025 02:31:27 +0000 (20:31 -0600)]

[RISC-V] Avoid multiple assignments to output object

This is the next batch of changes to reduce multiple assignments to an output
object.  This time I'm focused on splitters in bitmanip.md.

This doesn't convert every case.  For example there is one case that is very
clearly dependent on eliminating mvconst_internal and adjustment of a splitter
for andn and until those things happen it would clearly be a QOI implementation
regression.

There are cases where we set a scratch register more than once.  It may be
possible to use an additional scratch.  I haven't tried that yet.

I've seen one failure to if-convert a sequence after this patch, but it should
be resolved once the logical AND changes are merged.  Otherwise I'm primarily
seeing slight differences in register allocation and scheduling.  Nothing
concerning to me.

This has run through my tester, but I obviously want to see how it behaves in
the upstream CI system as that tests slightly different multilibs than mine (on
purpose).

gcc/

* config/riscv/bitmanip.md (various splits): Avoid writing the output
more than once when trivially possible.

commit | commitdiff | tree

Nathaniel Shead [Sat, 17 May 2025 13:51:07 +0000 (23:51 +1000)]

c++/modules: Fix ICE on merge of instantiation with partial spec [PR120013]

When we import a pending instantiation that matches an existing partial
specialisation, we don't find the slot in the entity map because for
partial specialisations we register the TEMPLATE_DECL but for normal
implicit instantiations we instead register the inner TYPE_DECL.

Because the DECL_MODULE_ENTITY_P flag is set we correctly realise that
it is in the entity map, but ICE when attempting to use that slot in
partition handling.

This patch fixes the issue by detecting this case and instead looking
for the slot for the TEMPLATE_DECL. It doesn't matter that we never add
a slot for the inner decl because we're about to discard it anyway.

PR c++/120013

gcc/cp/ChangeLog:

* module.cc (trees_in::install_entity): Handle re-registering
the inner TYPE_DECL of a partial specialisation.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-8.h: New test.
* g++.dg/modules/partial-8_a.C: New test.
* g++.dg/modules/partial-8_b.C: New test.
* g++.dg/modules/partial-8_c.C: New test.
* g++.dg/modules/partial-8_d.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

commit | commitdiff | tree

Nathaniel Shead [Mon, 19 May 2025 13:17:16 +0000 (23:17 +1000)]

c++/modules: Always mark tinfo vars as TREE_ADDRESSABLE [PR120350]

We need to mark type info decls as addressable if we take them by
reference; this is done by walking the declaration during parsing and
marking the decl as needed.

However, with modules we don't stream tinfo decls directly; rather we
stream just their name and type and reconstruct them in the importer
directly. This means that any addressable flags are not propagated, and
we error because TREE_ADDRESSABLE is not set despite taking its address.

But tinfo decls should always have TREE_ADDRESSABLE set, as any attempt
to use the tinfo decl will go through build_address anyway. So this
patch fixes the issue by eagerly marking the constructed decl as
TREE_ADDRESSABLE so that modules gets this flag correctly set as well.

PR c++/120350

gcc/cp/ChangeLog:

* rtti.cc (get_tinfo_decl_direct): Mark TREE_ADDRESSABLE.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tinfo-3_a.H: New test.
* g++.dg/modules/tinfo-3_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

commit | commitdiff | tree

liuhongt [Mon, 12 May 2025 06:21:30 +0000 (23:21 -0700)]

Extend vect_recog_cond_expr_convert_pattern to handle REAL_CST

REAL_CST is handled if it can be represented in different floating
point types without loss of precision or under fast math.

gcc/ChangeLog:

PR tree-optimization/103771
* match.pd (cond_expr_convert_p): Extend the match to handle
REAL_CST.
* tree-vect-patterns.cc
(vect_recog_cond_expr_convert_pattern): Handle REAL_CST.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103771-5.c: New test.
* gcc.target/i386/pr103771-6.c: New test.

commit | commitdiff | tree

Pan Li [Mon, 19 May 2025 02:06:35 +0000 (10:06 +0800)]

RISC-V: Tweak the asm check test of vx combine on GR2VR cost [NFC]

Tweak the asm check with define T uint8_t for adding more
vx test easily, as well as less possibility to make mistake.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Extract
define T as type for testing.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 12:09:05 +0000 (20:09 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 2

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Add asm check
for vrsub with GR2VR cost 2.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 12:02:11 +0000 (20:02 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 1

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Add asm check
for vrsub with GR2VR cost 1.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 11:53:46 +0000 (19:53 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 1 with GR2VR cost 0

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vrsub case 1 with GR2VR cost 0.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 09:17:46 +0000 (17:17 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 15

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Add asm check
for vrsub with GR2VR cost is 15.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 09:07:37 +0000 (17:07 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 1

Add asm dump check test for vec_duplicate + vrsub.vv combine to vrsub.vx

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Add vrsub asm
dump check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 08:49:29 +0000 (16:49 +0800)]

RISC-V: Add test for vec_duplicate + vrsub.vv combine case 0 with GR2VR cost 0

Add asm dump check and run test for vec_duplicate + vrsub.vv combine to vrsub.vx.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add vrsub asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper
macros for vx binary reversed.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vrsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vrsub-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Sun, 18 May 2025 08:41:01 +0000 (16:41 +0800)]

RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vrub.vv to the
vrsub.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY_REVERSE_CASE_0(T, OP, NAME)                   \
  void                                                                \
  test_vx_binary_reverse_##NAME##_##T##_case_0 (T * restrict out,     \
                                                T * restrict in, T x, \
                                                unsigned n)           \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = x OP in[i];                                            \
  }

  DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -)

Before this patch:
  54   │ test_vx_binary_reverse_rsub_int32_t_case_0:
  55   │     beq a3,zero,.L27
  56   │     vsetvli a5,zero,e32,m1,ta,ma
  57   │     vmv.v.x v2,a2
  58   │     slli    a3,a3,32
  59   │     srli    a3,a3,32
  60   │ .L22:
  61   │     vsetvli a5,a3,e32,m1,ta,ma
  62   │     vle32.v v1,0(a1)
  63   │     slli    a4,a5,2
  64   │     sub a3,a3,a5
  65   │     add a1,a1,a4
  66   │     vsub.vv v1,v2,v1
  67   │     vse32.v v1,0(a0)
  68   │     add a0,a0,a4
  69   │     bne a3,zero,.L22

After this patch:
  50   │ test_vx_binary_reverse_rsub_int32_t_case_0:
  51   │     beq a3,zero,.L27
  52   │     slli    a3,a3,32
  53   │     srli    a3,a3,32
  54   │ .L22:
  55   │     vsetvli a5,a3,e32,m1,ta,ma
  56   │     vle32.v v1,0(a1)
  57   │     slli    a4,a5,2
  58   │     sub a3,a3,a5
  59   │     add a1,a1,a4
  60   │     vrsub.vx    v1,v1,a2
  61   │     vse32.v v1,0(a0)
  62   │     add a0,a0,a4
  63   │     bne a3,zero,.L22

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Leverage the new add func to
expand the vx insn.
* config/riscv/riscv-protos.h (expand_vx_binary_vec_dup_vec): Add
new func decl to expand format v = vop(vec_dup(x), v).
(expand_vx_binary_vec_vec_dup): Diito but for format
v = vop(v, vec_dup(x)).
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
func impl to expand vx for v = vop(vec_dup(x), v).
(expand_vx_binary_vec_vec_dup): Diito but for another format
v = vop(v, vec_dup(x)).

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

GCC Administrator [Tue, 20 May 2025 00:18:27 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

Jeff Law [Mon, 19 May 2025 22:55:15 +0000 (16:55 -0600)]

[committed][RISC-V][PR target/120333] Remove bogus bext pattern

I goof'd when doing analysis of missed bext cases.  For the shift into the sign
bit, then shift into the low bit case (thankfully the least common), I got it
in my brain that the field is at the left shift count.   It's actually at
word_size - 1 - left shift count.

One the subtraction is included, it's no longer profitable to turn those cases
into bext.  Best case scenario would be sub+bext, but we can just as easily use
sll+srl which fuses in some designs into a single op.

So this patch removes those two patterns, adjusts the existing testcase and
adds the new execution test.

Given it's a partial reversion and has passed in my tester, I'm going to go
ahead and push it to the trunk rather than waiting for upstream CI.

PR target/120333
gcc/
* config/riscv/bitmanip.md: Remove bext formed from left+right
shift patterns.

gcc/testsuite/

* gcc.target/riscv/pr114512.c: Update expected output.
* gcc.target/riscv/pr120333.c: New test.

commit | commitdiff | tree

John David Anglin [Mon, 19 May 2025 21:28:00 +0000 (17:28 -0400)]

hpux: Fix detection of atomic support when profiling

The pa target lacks atomic sync compare and swap instructions.
These are implemented as libcalls and in libatomic. As on linux,
we lie about their availability.

This fixes the gcov-30.c test on hppa64-hpux11.

2025-05-19 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa-hpux.h (TARGET_HAVE_LIBATOMIC): Define.
(HAVE_sync_compare_and_swapqi): Likewise.
(HAVE_sync_compare_and_swaphi): Likewise.
(HAVE_sync_compare_and_swapsi): Likewise.
(HAVE_sync_compare_and_swapdi): Likewise.

commit | commitdiff | tree

Thomas Schwinge [Thu, 15 May 2025 16:11:16 +0000 (18:11 +0200)]

'TYPE_EMPTY_P' vs. code offloading [PR120308]

We've got 'gcc/stor-layout.cc:finalize_type_size':

/* Handle empty records as per the x86-64 psABI. */
TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);

(Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P',
calling 'gcc/tree.cc-default_is_empty_record'.)

And so it happens that for an empty struct used in code offloaded from x86_64
host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in
offloading compilation (where the offload targets (currently?) don't use it
themselves, and therefore aren't prepared to handle it).

For nvptx offloading compilation, this causes wrong code generation:
'ptxas [...] error : Call has wrong number of parameters', as nvptx code
generation for function definition doesn't pay attention to this flag (say, in
'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be
appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call'
via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus
get mismatching function definition vs. function call.

This issue apparently isn't a problem for GCN offloading, but I don't know if
that's by design or by accident.

Richard Biener:
> It looks like TYPE_EMPTY_P is only used during RTL expansion for ABI
> purposes, so computing it during layout_type is premature as shown here.
>
> I would suggest to simply re-compute it at offload stream-in time.

(For avoidance of doubt, the additions to 'gcc.target/nvptx/abi-struct-arg.c',
'gcc.target/nvptx/abi-struct-ret.c' are not dependent on the offload streaming
code changes, but are just to mirror the changes to
'libgomp.oacc-c-c++-common/abi-struct-1.c'.)

PR lto/120308
gcc/
* lto-streamer-out.cc (hash_tree): Don't handle 'TYPE_EMPTY_P' for
'lto_stream_offload_p'.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c: Add empty
structure testing.
gcc/testsuite/
* gcc.target/nvptx/abi-struct-arg.c: Add empty structure testing.
* gcc.target/nvptx/abi-struct-ret.c: Likewise.

commit | commitdiff | tree

Thomas Schwinge [Thu, 15 May 2025 16:10:05 +0000 (18:10 +0200)]

Add 'libgomp.c-c++-common/target-abi-struct-1-O0.c', 'libgomp.oacc-c-c++-common/abi-struct-1.c'

libgomp/
* testsuite/libgomp.c-c++-common/target-abi-struct-1-O0.c: New.
* testsuite/libgomp.oacc-c-c++-common/abi-struct-1.c: Likewise.

commit | commitdiff | tree

Julian Brown [Tue, 3 Sep 2019 14:57:05 +0000 (07:57 -0700)]

Fix libgomp.oacc-fortran/lib-13.f90 async bug

libgomp/
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
wait API calls.

commit | commitdiff | tree

Jeff Law [Mon, 19 May 2025 18:00:56 +0000 (12:00 -0600)]

[RISC-V] Fix false positive from Wuninitialized

As Mark and I independently tripped, there's a Wuninitialized issue in the
RISC-V backend. While *I* know the value would always be properly initialized,
it'd be somewhat painful to either eliminate the infeasible paths or do deep
enough analysis to suppress the false positive.

So this initializes OUTPUT and verifies it's got a reasonable value before
using it for the final copy into operands[0].

Bootstrapped on the BPI (regression testing still has ~12hrs to go).

gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): Initialize OUTPUT and
verify it's non-null before emitting the final copy insn.

commit | commitdiff | tree

Harald Anlauf [Sun, 18 May 2025 20:42:26 +0000 (22:42 +0200)]

Fortran: fix FAIL of gfortran.dg/specifics_1.f90 after r16-372 [PR120099]

After commit r16-372, testcase gfortran.dg/specifics_1.f90 started to
FAIL at -O2 and higher, as DCE lead to elimination of evaluations of
Fortran specific intrinsics returning complex results and with -ff2c.
As the Fortran runtime library is compiled with -fno-f2c, the frontend
generates calls to wrapper subroutines _gfortran_f2c_specific_* that
return their result by reference via their first argument when this is
needed. This is e.g. the case when specific names of the intrinsics are
used for passing as actual argument to procedures. These wrappers are
not pure in the GCC IR sense, even if the Fortran intrinsics are.
Therefore gfc_return_by_reference must return true for these.

PR fortran/120099

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_return_by_reference): Intrinsic functions
returning complex numbers may return their result by reference
with -ff2c.

commit | commitdiff | tree

Richard Earnshaw [Mon, 19 May 2025 15:19:39 +0000 (16:19 +0100)]

arm: fully validate mem_noofs_operand [PR120351]

It's not enough to just check that a memory operand is of the form
mem(reg); after RA we also need to validate the register being used.
The safest way to do this is to call memory_operand.

PR target/120351

gcc/ChangeLog:

* config/arm/predicates.md (mem_noofs_operand): Also check the op
is a valid memory_operand.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr120351.c: New test.

commit | commitdiff | tree

Jonathan Wakely [Fri, 16 May 2025 10:54:46 +0000 (11:54 +0100)]

libstdc++: Fix some Clang -Wsystem-headers warnings in <ranges>

libstdc++-v3/ChangeLog:

* include/std/ranges (_ZipTransform::operator()): Remove name of
unused parameter.
(chunk_view::_Iterator, stride_view::_Iterator): Likewise.
(join_with_view): Declare _Iterator and _Sentinel as class
instead of struct.
(repeat_view): Declare _Iterator as class instead of struct.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

commit | commitdiff | tree

Jonathan Wakely [Thu, 15 May 2025 18:32:01 +0000 (19:32 +0100)]

libstdc++: Fix std::format of chrono::local_days with {} [PR120293]

Formatting of chrono::local_days with an empty chrono-specs should be
equivalent to inserting it into an ostream, which should use the
overload for inserting chrono::sys_days into an ostream. The
implementation of empty chrono-specs in _M_format_to_ostream takes some
short cuts, and that wasn't being done correctly for chrono::local_days.

libstdc++-v3/ChangeLog:

PR libstdc++/120293
* include/bits/chrono_io.h (_M_format_to_ostream): Add special
case for local_time convertible to local_days.
* testsuite/std/time/clock/local/io.cc: Check formatting of
chrono::local_days.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

commit | commitdiff | tree

Dongyan Chen [Mon, 19 May 2025 07:17:12 +0000 (15:17 +0800)]

RISC-V: Fix the warning of temporary object dangling references.

During the GCC compilation, some warnings about temporary object dangling
references emerged. They appeared in these code lines in riscv-common.cc:
const riscv_ext_info_t &implied_ext_info, const riscv_ext_info_t &ext_info = get_riscv_ext_info (ext) and auto &ext_info = get_riscv_ext_info (search_ext).
The issue arose because the local variable types were not used in a standardized
way, causing their references to dangle once the function ended.
To fix this, the patch changes the argument type of get_riscv_ext_info to
`const char *`, thereby eliminating the warnings.

Changes for v2:
- Change the argument type of get_riscv_ext_info to `const char *` to eliminate the warnings.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (get_riscv_ext_info): Fix argument type.
(riscv_subset_list::check_implied_ext): Type conversion.

commit | commitdiff | tree

zhusonghe [Mon, 19 May 2025 02:43:48 +0000 (10:43 +0800)]

RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cc

The variables `major` and `minor` in `gen-riscv-ext-texi.cc`
conflict with the macros of the same name defined in `<sys/sysmacros.h>`,
which are exposed when building with newer versions of GCC on older
Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them
to `major_version` and `minor_version` respectively. This aligns with the
GCC community's recommended practice [1] and improves code clarity.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683881.html

gcc/ChangeLog:

* config/riscv/gen-riscv-ext-texi.cc (struct version_t):rename
major/minor to major_version/minor_version.

Signed-off-by: Songhe Zhu <zhusonghe@eswincomputing.com>

commit | commitdiff | tree

Kito Cheng [Mon, 12 May 2025 09:38:39 +0000 (02:38 -0700)]

RISC-V: Support Zilsd code gen

This commit adds the code gen support for Zilsd, which is a
newly added extension for RISC-V. The Zilsd extension allows
for loading and storing 64-bit values using even-odd register
pairs.

We only try to do miminal code gen support for that, which means only
use the new instructions when the load store is 64 bits data, we can use
that to optimize the code gen of memcpy/memset/memmove and also the
prologue and epilogue of functions, but I think that probably should be
done in a follow up patch.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Handle
load/store with odd-even reg pair.
(riscv_split_64bit_move_p): Don't split load/store if zilsd enabled.
(riscv_hard_regno_mode_ok): Only allow even reg can be used for
64 bits mode for zilsd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zilsd-code-gen.c: New test.

commit | commitdiff | tree

Jennifer Schmitz [Thu, 15 May 2025 14:16:15 +0000 (07:16 -0700)]

regcprop: Return from copy_value for unordered modes

The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
partial_subreg_p in the function copy_value during the RTL pass
regcprop, failing the assertion in

inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
  /* Modes involved in a subreg must be ordered.  In particular, we must
     always know at compile time whether the subreg is paradoxical.  */
  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
  return maybe_lt (outer_prec, inner_prec);
}

Returning from the function if the modes are not ordered before reaching
the call to partial_subreg_p resolves the ICE and passes bootstrap and
testing without regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR middle-end/120276
* regcprop.cc (copy_value): Return in case of unordered modes.

gcc/testsuite/
PR middle-end/120276
* gcc.dg/torture/pr120276.c: New test.

commit | commitdiff | tree

Kito Cheng [Mon, 12 May 2025 06:36:07 +0000 (14:36 +0800)]

RISC-V: Add new operand constraint: cR

This commit introduces a new operand constraint `cR` for the RISC-V
architecture, which allows the use of an even-odd RVC general purpose register
(x8-x15) in inline asm.

Ref: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/102

gcc/ChangeLog:

* config/riscv/constraints.md (cR): New constraint.
* doc/md.texi (Machine Constraints::RISC-V): Document the new cR
constraint.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/constraint-cR-pair.c: New test case.

commit | commitdiff | tree

Haochen Jiang [Tue, 25 Mar 2025 07:42:14 +0000 (15:42 +0800)]

i386: Combine AVX10.2 compile time test

Since AVX10.2 enables everything, there is no need to split testcases
for 256 and 512 bit size.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-bf16-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-bf16-1.c: ... to this.
* gcc.target/i386/avx10_2-512-bf16-vector-cmp-1.c: Removed and
combined ...
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c:... to this.
* gcc.target/i386/avx10_2-512-bf16-vector-fma-1.c: Removed and
combined ...
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c:... to this.
* gcc.target/i386/avx10_2-512-bf16-vector-operations-1.c: Removed
and combined ...
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c:... to this.
* gcc.target/i386/avx10_2-512-bf16-vector-smaxmin-1.c: Removed
and combined ...
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c:... to this.
* gcc.target/i386/avx10_2-512-convert-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-convert-1.c:... to this.
* gcc.target/i386/avx10_2-512-media-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-media-1.c:... to this.
* gcc.target/i386/avx10_2-512-minmax-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-minmax-1.c:... to this.
* gcc.target/i386/avx10_2-512-movrs-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-movrs-1.c:... to this.
* gcc.target/i386/avx10_2-512-satcvt-1.c: Removed and combined ...
* gcc.target/i386/avx10_2-satcvt-1.c:... to this.
* gcc.target/i386/sm4-avx10_2-512-1.c: Move to...
* gcc.target/i386/sm4-avx10_2-1b.c: ...here.

commit | commitdiff | tree

Haochen Jiang [Mon, 24 Mar 2025 09:02:44 +0000 (17:02 +0800)]

i386: Refactor AVX10.2 runtime test

Since everything is under avx10.2, we could use a header
file plus a file actually run all the tests for runtime
test.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10-check.h: Remove AVX10_512BIT.
* gcc.target/i386/avx10-minmax-helper.h: Ditto.
* gcc.target/i386/avx10_2-vaddbf16-2.c: Add 512 test.
* gcc.target/i386/avx10_2-vcmpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vdivbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfpclassbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetexpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetmantbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmaxbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vmulbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vrcpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vreducebf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrndscalebf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrsqrtbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vscalefbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsqrtbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsubbf16-2.c: Ditto.
* gcc.target/i386/avx512f-helper.h: Remove AVX10_512BIT.
* gcc.target/i386/sm4-check.h: Use AVX10_2.
* gcc.target/i386/avx10_2-512-vaddbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vaddbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcmpbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcmpbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbf162ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbf162iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvthf82ph-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvthf82ph-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2bf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2hf8-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtph2iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtps2ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvtps2iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttbf162ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttbf162iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttph2ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttph2iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2dqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2ibs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2iubs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2qqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2udqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vcvttps2uqqs-2.h: ...here.
* gcc.target/i386/avx10_2-512-vdivbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vdivbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vdpphps-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vdpphps-2.h: ...here.
* gcc.target/i386/avx10_2-512-vfmaddXXXbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vfmsubXXXbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vfnmaddXXXbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vfnmsubXXXbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vfpclassbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vfpclassbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vgetexpbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vgetexpbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vgetmantbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vgetmantbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vmaxbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vmaxbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vminbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vminbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vminmaxbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vminmaxpd-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vminmaxpd-2.h: ...here.
* gcc.target/i386/avx10_2-512-vminmaxph-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vminmaxph-2.h: ...here.
* gcc.target/i386/avx10_2-512-vminmaxps-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vminmaxps-2.h: ...here.
* gcc.target/i386/avx10_2-512-vmpsadbw-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vmpsadbw-2.h: ...here.
* gcc.target/i386/avx10_2-512-vmulbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vmulbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbssd-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbssds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbssds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbsud-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbsud-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbsuds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbsuds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbuud-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbuud-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpbuuds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpbuuds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwsud-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwsud-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwsuds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwsuds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwusd-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwusd-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwusds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwusds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwuud-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwuud-2.h: ...here.
* gcc.target/i386/avx10_2-512-vpdpwuuds-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vpdpwuuds-2.h: ...here.
* gcc.target/i386/avx10_2-512-vrcpbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vrcpbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vreducebf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vreducebf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vrndscalebf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vrndscalebf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vrsqrtbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vrsqrtbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vscalefbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vscalefbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vsqrtbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vsqrtbf16-2.h: ...here.
* gcc.target/i386/avx10_2-512-vsubbf16-2.c:
Remove 512 test. Move to...
* gcc.target/i386/avx10_2-vsubbf16-2.h: ...here.
* gcc.target/i386/sm4key4-avx10_2-512-2.c:
Remove 512 test. Move to...
* gcc.target/i386/sm4key4-avx10_2-2.c: ...here.
* gcc.target/i386/sm4rnds4-avx10_2-512-2.c:
Remove 512 test. Move to...
* gcc.target/i386/sm4rnds4-avx10_2-2.c: ...here.
* gcc.target/i386/vnniint16-auto-vectorize-4.c: Use AVX10_SCALAR
for 512 bit test.
* gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Fri, 14 Mar 2025 07:00:33 +0000 (15:00 +0800)]

i386: Combine AVX10.2 intrin files

Since we use a single avx10.2 to enable everything, there is
no need to split them into two files.

gcc/ChangeLog:

* config.gcc: Remove 512 intrin file.
* config/i386/avx10_2-512bf16intrin.h:
Removed and combined to ...
* config/i386/avx10_2bf16intrin.h: ... this.
* config/i386/avx10_2-512convertintrin.h:
Removed and combined to ...
* config/i386/avx10_2convertintrin.h: ... this.
* config/i386/avx10_2-512mediaintrin.h:
Removed and combined to ...
* config/i386/avx10_2mediaintrin.h: ... this.
* config/i386/avx10_2-512minmaxintrin.h:
Removed and combined to ...
* config/i386/avx10_2minmaxintrin.h: ... this.
* config/i386/avx10_2-512satcvtintrin.h:
Removed and combined to ...
* config/i386/avx10_2satcvtintrin.h: ... this.
* config/i386/immintrin.h: Remove 512 intrin file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Combine tests and change
intrin file name.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Fri, 14 Mar 2025 06:27:36 +0000 (14:27 +0800)]

i386: Remove duplicate iterators in md

There are several iterators no longer needed in md files since
after refactor in AVX10, they could directly use legacy AVX512
ones. Remove those duplicate iterators.

gcc/ChangeLog:

* config/i386/sse.md (VF1_VF2_AVX10_2): Removed.
(VF2_AVX10_2): Ditto.
(VI1248_AVX10_2): Ditto.
(VFH_AVX10_2): Ditto.
(VF1_AVX10_2): Ditto.
(VHF_AVX10_2): Ditto.
(VBF_AVX10_2): Ditto.
(VI8_AVX10_2): Ditto.
(VI2_AVX10_2): Ditto.
(VBF): New.
(div<mode>3): Use VBF instead of AVX10.2 ones.
(vec_cmp<mode><avx512fmaskmodelower>): Ditto.
(avx10_2_cvt2ps2phx_<mode><mask_name><round_name>):
Use VHF_AVX512VL instead of AVX10.2 ones.
(vcvt<convertfp8_pack><mode><mask_name>): Ditto.
(vcvthf82ph<mode><mask_name>): Ditto.
(VHF_AVX10_2_2): Remove not needed TARGET_AVX10_2.
(usdot_prod<sseunpackmodelower><mode>): Use VI2_AVX512F
instead of AVX10.2 ones.
(vdpphps_<mode>): Use VF1_AVX512VL instead of AVX10.2 ones.
(vdpphps_<mode>_mask): Ditto.
(vdpphps_<mode>_maskz): Ditto.
(vdpphps_<mode>_maskz_1): Ditto.
(avx10_2_scalefbf16_<mode><mask_name>): Use VBF instead of
AVX10.2 ones.
(<code><mode>3): Ditto.
(avx10_2_<code>bf16_<mode><mask_name>): Ditto.
(avx10_2_fmaddbf16_<mode>_maskz); Ditto.
(avx10_2_fmaddbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmaddbf16_<mode>_mask): Ditto.
(avx10_2_fmaddbf16_<mode>_mask3): Ditto.
(avx10_2_fnmaddbf16_<mode>_maskz): Ditto.
(avx10_2_fnmaddbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmaddbf16_<mode>_mask): Ditto.
(avx10_2_fnmaddbf16_<mode>_mask3): Ditto.
(avx10_2_fmsubbf16_<mode>_maskz); Ditto.
(avx10_2_fmsubbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmsubbf16_<mode>_mask): Ditto.
(avx10_2_fmsubbf16_<mode>_mask3): Ditto.
(avx10_2_fnmsubbf16_<mode>_maskz): Ditto.
(avx10_2_fnmsubbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmsubbf16_<mode>_mask): Ditto.
(avx10_2_fnmsubbf16_<mode>_mask3): Ditto.
(avx10_2_rsqrtbf16_<mode><mask_name>): Ditto.
(avx10_2_sqrtbf16_<mode><mask_name>): Ditto.
(avx10_2_rcpbf16_<mode><mask_name>): Ditto.
(avx10_2_getexpbf16_<mode><mask_name>): Ditto.
(avx10_2_<bf16immop>bf16_<mode><mask_name>): Ditto.
(avx10_2_fpclassbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cmpbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>bf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
Ditto.
(avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Use VHF_AVX512VL instead of AVX10.2 ones.
(avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Use VF1_AVX512VL instead of AVX10.2 ones.
(avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_vcvtt<castmode>2<sat_cvt_sign_prefix>dqs<mode><mask_name><round_saeonly_name>):
Use VF instead of AVX10.2 ones.
(avx10_2_vcvttpd2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Use VF2 instead of AVX10.2 ones.
(avx10_2_vcvttps2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Use VI8 instead of AVX10.2 ones.
(avx10_2_minmaxbf16_<mode><mask_name>): Use VBF instead of
AVX10.2 ones.
(avx10_2_minmaxp<mode><mask_name><round_saeonly_name>):
Use VFH_AVX512VL instead of AVX10.2 ones.
(avx10_2_vmovrs<ssemodesuffix><mode><mask_name>):
Use VI1248_AVX512VLBW instead of AVX10.2 ones.

commit | commitdiff | tree

Haochen Jiang [Wed, 14 May 2025 06:57:41 +0000 (14:57 +0800)]

i386: Remove avx10.1-256/512 and evex512 options

As we mentioned in GCC 15, we will remove avx10.1-256/512 and evex512
in GCC 16. Also, the combination of AVX10 and AVX512 option behavior
will also be simplified in GCC 16 since AVX10.1 now implied AVX512,
making the behavior matching everyone else.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_available_features): Remove feature set for AVX10_1_256.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): Removed.
(OPTION_MASK_ISA2_AVX10_1_256_SET): Removed.
(OPTION_MASK_ISA_AVX10_1_SET): Imply all AVX512 features.
(OPTION_MASK_ISA2_AVX10_1_SET): Ditto.
(OPTION_MASK_ISA2_AVX2_UNSET): Remove AVX10_1_UNSET.
(OPTION_MASK_ISA2_EVEX512_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_1_UNSET): Remove AVX10_1_256.
(OPTION_MASK_ISA2_AVX512F_UNSET): Unset AVX10_1.
(ix86_handle_option): Remove special handling for AVX512/AVX10.1
options, evex512 and avx10_1_256. Modify ISA set for AVX10 options.
* common/config/i386/i386-cpuinfo.h
(enum feature_priority): Remove P_AVX10_1_256.
(enum processor_features): Remove FEATURE_AVX10_1_256.
* common/config/i386/i386-isas.h: Remove avx10.1-256/512.
* config/i386/avx512bf16intrin.h: Rollback target push before
evex512 is introduced.
* config/i386/avx512bf16vlintrin.h: Ditto.
* config/i386/avx512bitalgintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: Ditto.
* config/i386/avx512bwintrin.h: Ditto.
* config/i386/avx512cdintrin.h: Ditto.
* config/i386/avx512dqintrin.h: Ditto.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/avx512fp16intrin.h: Ditto.
* config/i386/avx512fp16vlintrin.h: Ditto.
* config/i386/avx512ifmaintrin.h: Ditto.
* config/i386/avx512ifmavlintrin.h: Ditto.
* config/i386/avx512vbmi2intrin.h: Ditto.
* config/i386/avx512vbmi2vlintrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vbmivlintrin.h: Ditto.
* config/i386/avx512vlbwintrin.h: Ditto.
* config/i386/avx512vldqintrin.h: Ditto.
* config/i386/avx512vlintrin.h: Ditto.
* config/i386/avx512vnniintrin.h: Ditto.
* config/i386/avx512vnnivlintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vp2intersectvlintrin.h: Ditto.
* config/i386/avx512vpopcntdqintrin.h: Ditto.
* config/i386/avx512vpopcntdqvlintrin.h: Ditto.
* config/i386/gfniintrin.h: Ditto.
* config/i386/vaesintrin.h: Ditto.
* config/i386/vpclmulqdqintrin.h: Ditto.
* config/i386/driver-i386.cc (check_avx512_features): Removed.
(host_detect_local_cpu): Remove -march=native special handling.
* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Remove TARGET_EVEX512.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Remove EVEX512 and AVX10_1_256.
* config/i386/i386-expand.cc
(ix86_valid_mask_cmp_mode): Remove TARGET_EVEX512.
(ix86_expand_int_sse_cmp): Ditto.
(ix86_vector_duplicate_simode_const): Ditto.
(ix86_expand_vector_init_duplicate): Ditto.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_emit_swsqrtsf): Ditto.
(ix86_vectorize_vec_perm_const): Ditto.
(ix86_expand_vecop_qihi2): Ditto.
(ix86_expand_sse2_mulvxdi3): Ditto.
(ix86_gen_bcst_mem): Ditto.
* config/i386/i386-isa.def (EVEX512): Removed.
(AVX10_1_256): Ditto.
* config/i386/i386-options.cc
(isa2_opts): Remove evex512 and avx10.1-256.
(ix86_function_specific_save): Remove no_avx512_explicit and
no_avx10_1_explicit.
(ix86_function_specific_restore): Ditto.
(ix86_valid_target_attribute_inner_p): Remove evex512 and
avx10.1-256/512.
(ix86_valid_target_attribute_tree): Remove special handling
to rerun ix86_option_override_internal for AVX10.1-256.
(ix86_option_override_internal): Remove warning handling.
(ix86_simd_clone_adjust): Remove evex512.
* config/i386/i386.cc
(type_natural_mode): Remove TARGET_EVEX512.
(ix86_return_in_memory): Ditto.
(standard_sse_constant_p): Ditto.
(standard_sse_constant_opcode): Ditto.
(ix86_get_ssemov): Ditto.
(ix86_legitimate_constant_p): Ditto.
(ix86_vectorize_builtin_scatter): Ditto.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_set_reg_reg_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_mode_supported_p): Ditto.
(ix86_preferred_simd_mode): Ditto.
(ix86_autovectorize_vector_modes): Ditto.
(ix86_get_mask_mode): Ditto.
(ix86_simd_clone_compute_vecsize_and_simdlen): Ditto.
(ix86_simd_clone_usable): Ditto.
* config/i386/i386.h (BIGGEST_ALIGNMENT): Ditto.
(MOVE_MAX): Ditto.
(STORE_MAX_PIECES): Ditto.
(PTA_SKYLAKE_AVX512): Remove PTA_EVEX512.
(PTA_CANNONLAKE): Ditto.
(PTA_ZNVER4): Ditto.
(PTA_GRANITERAPIDS): Use PTA_AVX10_1.
(PTA_DIAMONDRAPIDS): Use PTA_GRANITERAPIDS.
* config/i386/i386.md: Remove TARGET_EVEX512, avx512f_512
and avx512bw_512.
* config/i386/i386.opt: Remove ix86_no_avx512_explicit,
ix86_no_avx10_1_explicit, mevex512, mavx10.1-256/512 and
warning for mavx10.1. Modify option comment.
* config/i386/i386.opt.urls: Remove evex512 and avx10.1-256/512.
* config/i386/predicates.md: Remove TARGET_EVEX512.
* config/i386/sse.md: Ditto.
* doc/extend.texi: Remove avx10.1-256/512. Modify avx10.1 doc.
* doc/invoke.texi: Remove avx10.1-256/512 and evex512.
* doc/sourcebuild.texi: Remove avx10.1-256/512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-1.c: Remove warning.
* gcc.target/i386/avx10_1-2.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-4.c: Ditto.
* gcc.target/i386/pr111068.c: Ditto.
* gcc.target/i386/pr117946.c: Ditto.
* gcc.target/i386/pr117240_avx512f.c: Remove -mevex512 and
warning.
* gcc.target/i386/avx10_1-11.c: Rename to ...
* gcc.target/i386/avx10_1-5.c: ... this. Remove warning.
* gcc.target/i386/avx10_1-12.c: Rename to ...
* gcc.target/i386/avx10_1-6.c: ... this. Remove warning.
* gcc.target/i386/avx10_1-26.c: Rename to ...
* gcc.target/i386/avx10_1-7.c: ... this. Remove warning.
The origin avx10_1-7.c is removed.
* gcc.target/i386/avx10_1-10.c: Removed.
* gcc.target/i386/avx10_1-13.c: Removed.
* gcc.target/i386/avx10_1-14.c: Removed.
* gcc.target/i386/avx10_1-15.c: Removed.
* gcc.target/i386/avx10_1-16.c: Removed.
* gcc.target/i386/avx10_1-17.c: Removed.
* gcc.target/i386/avx10_1-18.c: Removed.
* gcc.target/i386/avx10_1-19.c: Removed.
* gcc.target/i386/avx10_1-20.c: Removed.
* gcc.target/i386/avx10_1-21.c: Removed.
* gcc.target/i386/avx10_1-22.c: Removed.
* gcc.target/i386/avx10_1-23.c: Removed.
* gcc.target/i386/avx10_1-8.c: Removed.
* gcc.target/i386/avx10_1-9.c: Removed.
* gcc.target/i386/noevex512-1.c: Removed.
* gcc.target/i386/noevex512-2.c: Removed.
* gcc.target/i386/noevex512-3.c: Removed.
* gcc.target/i386/pr111889.c: Removed.
* gcc.target/i386/pr111907.c: Removed.

commit | commitdiff | tree

Haochen Jiang [Wed, 14 May 2025 07:19:42 +0000 (15:19 +0800)]

i386: Unpush OPTION_MASK_ISA2_EVEX512 for builtins

As we mentioned in GCC 15, we will remove evex512 in GCC 16 since it
is not useful anymore since we will have 512 bit directly. This patch
will first unpush evex512 in the builtins.

gcc/ChangeLog:

* config/i386/i386-builtin.def
(BDESC): Remove OPTION_MASK_ISA2_EVEX512.
* config/i386/i386-builtins.cc
(ix86_init_mmx_sse_builtins): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr90096.c: Adjust error message.
* gcc.target/i386/pr117304-1.c: Removed.

commit | commitdiff | tree

GCC Administrator [Mon, 19 May 2025 00:16:39 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Dimitar Dimitrov [Sat, 3 May 2025 19:38:30 +0000 (22:38 +0300)]

emit-rtl: Allow extra checks for paradoxical subregs [PR119966]

When a paradoxical subreg is detected, validate_subreg exits early, thus
skipping the important checks later in the function.

Fix by continuing with the checks instead of declaring early that the
paradoxical subreg is valid.

One of the newly allowed subsequent checks needed to be disabled for
paradoxical subregs.  It turned out that combine attempts to create
a paradoxical subreg of mem even for strict-alignment targets.
That is invalid and should eventually be rejected, but is
temporarily left allowed to prevent regressions for
armv8l-unknown-linux-gnueabihf.  See PR120329 for more details.

Tests I did:
- No regressions were found for C and C++ for the following targets:
   - native x86_64-pc-linux-gnu
   - cross riscv64-unknown-linux-gnu
   - cross riscv32-none-elf
- Sanity checked armv8l-unknown-linux-gnueabihf by cross-building
   up to including libgcc.  Linaro CI bot further confirmed there
   are no regressions.
- Sanity checked powerpc64-unknown-linux-gnu by building native
   toolchain, but I could not setup qemu-user for DejaGnu testing.

PR target/119966

gcc/ChangeLog:

* emit-rtl.cc (validate_subreg): Do not exit immediately for
paradoxical subregs.  Filter subsequent tests which are
not valid for paradoxical subregs.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

commit | commitdiff | tree

Eric Botcazou [Sun, 18 May 2025 17:10:26 +0000 (19:10 +0200)]

Partially lift restriction from loc_list_from_tree_1

The function accepts all handled_component_p expressions and decodes them by
means of get_inner_reference as expected, but bails out on bitfields:

        /* TODO: We can extract value of the small expression via shifting
   even for nonzero bitpos.  */
        if (list_ret == 0)
          return 0;
        if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
            || !multiple_p (bitsize, BITS_PER_UNIT))
          {
            expansion_failed (loc, NULL_RTX,
                              "bitfield access");
            return 0;
          }

This lifts the second part of the restriction, which helps for obscure cases
of packed discriminated record types in Ada, although this requires the very
latest GDB sources.

gcc/
* dwarf2out.cc (loc_list_from_tree_1) <COMPONENT_REF>: Do not bail
out when the size is not a multiple of a byte.
Deal with bit-fields whose size is not a multiple of a byte when
dereferencing an address.

commit | commitdiff | tree

Andrew Pinski [Sun, 18 May 2025 00:21:39 +0000 (17:21 -0700)]

phiopt: Use mark_lhs_in_seq_for_dce instead of doing it inline

Right now phiopt has the same code as mark_lhs_in_seq_for_dce
inlined into match_simplify_replacement. Instead let's use the
function in gimple-fold that does the same thing.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-fold.cc (mark_lhs_in_seq_for_dce): Make
non-static.
* gimple-fold.h (mark_lhs_in_seq_for_dce): Declare.
* tree-ssa-phiopt.cc (match_simplify_replacement): Use
mark_lhs_in_seq_for_dce instead of manually looping.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Mark Wielaard [Sun, 18 May 2025 14:20:10 +0000 (16:20 +0200)]

Regenerate cobol/lang.opt.urls

The Cobol frontend lang.opt got -M added, but lang.opt.urls wasn't
regenerated.

Fixes: 92b6485a75ca ("cobol: Eliminate exception "blob"; streamline some code generation.")
gcc/cobol/ChangeLog:

* lang.opt.urls: Regenerated.

commit | commitdiff | tree

GCC Administrator [Sun, 18 May 2025 00:17:00 +0000 (00:17 +0000)]

Daily bump.

commit | commitdiff | tree

Oleg Endo [Sat, 17 May 2025 16:51:35 +0000 (10:51 -0600)]

[PATCH] libgcc SH: fix alignment for relaxation

From 6462f1e6a2565c5d4756036d9bc2f39dce9bd768 Mon Sep 17 00:00:00 2001
From: QBos07 <qubos@outlook.de>
Date: Sat, 10 May 2025 16:56:28 +0000
Subject: [PATCH] libgcc SH: fix alignment for relaxation

when relaxation is enabled we can not infer the alignment
from the position as that may change. This should not change
non-relaxed builds as its allready aligned there. This was
the missing piece to building an entire toolchain with -mrelax

Credit goes to Oleg Endo: https://sourceware.org/bugzilla/show_bug.cgi?id=3298#c4

libgcc/
* config/sh/lib1funcs.S (ashiftrt_r4_32): Increase alignment.
(movemem): Force alignment of the mova intruction.

commit | commitdiff | tree

Jeff Law [Sat, 17 May 2025 15:37:01 +0000 (09:37 -0600)]

[RISC-V] Fix ICE due to bogus use of gen_rtvec

Found this while setting up the risc-v coordination branch off of gcc-15.  Not
sure why I didn't use rtvec_alloc directly here since we're going to initialize
the whole vector ourselves.  Using gen_rtvec was just wrong as it's walking
down a non-existent varargs list.  Under the "right" circumstances it can walk
off a page and fault.

This was seen with a test already in the testsuite (I forget which test), so no
new regression test.

Tested in my tester and verified the failure on the coordination branch is
resolved a well.  Waiting on pre-commit CI to render a verdict.

gcc/
* config/riscv/riscv-vect-permconst.cc (vector_permconst:process_bb):
Use rtvec_alloc, not gen_rtvec since we don't want/need to initialize
the vector.

Mirror of https://gcc.gnu.org/git/gcc.git

RSS Atom