git.ipfire.org Git - thirdparty/gcc.git/log

]> git.ipfire.org Git - thirdparty/gcc.git/log

projects / thirdparty / gcc.git / log

commit | commitdiff | tree

GCC Administrator [Thu, 29 Feb 2024 00:18:07 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 28 Feb 2024 00:18:09 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

H.J. Lu [Sun, 25 Feb 2024 18:21:04 +0000 (10:21 -0800)]

x86: Properly implement AMX-TILE load/store intrinsics

ldtilecfg and sttilecfg take a 512-byte memory block.  With
_tile_loadconfig implemented as

extern __inline void
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_tile_loadconfig (const void *__config)
{
  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
}

GCC sees:

(parallel [
  (asm_operands/v ("ldtilecfg %X0") ("") 0
   [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
                         (const_int -64 [0xffffffffffffffc0])) [1 MEM[(const void * *)&tile_data]+0 S8 A128])]
   [(asm_input:DI ("m"))]
   (clobber (reg:CC 17 flags))])

and the memory operand size is 1 byte.  As the result, the rest of 511
bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
with a pointer to XImode to honor the 512-byte memory block.

gcc/ChangeLog:

PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.c (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.

gcc/testsuite/ChangeLog:

PR target/114098
* gcc.target/i386/amxtile-4.c: New test.

(cherry picked from commit 4972f97a265c574d51e20373ddefd66576051e5c)

commit | commitdiff | tree

Jeevitha [Thu, 31 Aug 2023 10:40:18 +0000 (05:40 -0500)]

rs6000: Don't allow AltiVec address in movoo & movxo pattern [PR110411]

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects. The solution is to modify the constraints used in the movoo and
movxo pattern to disallow these types of addresses, which assists LRA in
resolving this issue. Furthermore, the mode size 16 check has been
removed in vsx_quad_dform_memory_operand to allow OOmode and XOmode, and
quad_address_p already handles less than size 16.

2023-08-31 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/110411
* config/rs6000/mma.md (define_insn_and_split movoo): Disallow
AltiVec address operands.
(define_insn_and_split movxo): Likewise.
* config/rs6000/predicates.md (vsx_quad_dform_memory_operand): Remove
redundant mode size check.

gcc/testsuite/
PR target/110411
* gcc.target/powerpc/pr110411-1.c: New testcase.
* gcc.target/powerpc/pr110411-2.c: New testcase.

(cherry picked from commit 9ea1248604d7b65009af32103814332f35bd33e2)

commit | commitdiff | tree

GCC Administrator [Tue, 27 Feb 2024 00:19:37 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Eric Botcazou [Mon, 26 Feb 2024 12:13:34 +0000 (13:13 +0100)]

Finalization of object allocated by anonymous access designating local type

The finalization of objects dynamically allocated through an anonymous
access type is deferred to the enclosing library unit in the current
implementation and a warning is given on each of them.

However this cannot be done if the designated type is local, because this
would generate dangling references to the local finalization routine, so
the finalization needs to be dropped in this case and the warning adjusted.

gcc/ada/
PR ada/113893
* exp_ch7.adb (Build_Anonymous_Master): Do not build the master
for a local designated type.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Force Needs_Fin
to false if no finalization master is attached to an access type
and assert that it is anonymous in this case.
* sem_res.adb (Resolve_Allocator): Mention that the object might
not be finalized at all in the warning given when the type is an
anonymous access-to-controlled type.

gcc/testsuite/
* gnat.dg/access10.adb: New test.

commit | commitdiff | tree

GCC Administrator [Mon, 26 Feb 2024 00:19:32 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 25 Feb 2024 00:19:32 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 24 Feb 2024 00:19:24 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Richard Earnshaw [Thu, 22 Feb 2024 16:47:20 +0000 (16:47 +0000)]

arm: fix ICE with vectorized reciprocal division [PR108120]

The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.

Fixed by only enabling vector division when doing unsafe math.

gcc:

PR target/108120
* config/arm/neon.md (div<VCVTF:mode>3): Rename from div<mode>3.
Gate with ARM_HAVE_NEON_<MODE>_ARITH.

gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.

(cherry picked from commit 016c4eed368b80a97101f6156ed99e4c5474fbb7)

commit | commitdiff | tree

GCC Administrator [Fri, 23 Feb 2024 00:18:48 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 22 Feb 2024 00:21:11 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 21 Feb 2024 00:19:29 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Alex Coplan [Tue, 30 Jan 2024 09:39:59 +0000 (09:39 +0000)]

aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components.  The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp.  That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.

As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range.  In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.

Unlike for GCC 14 we need additional handling in the load/store pair
code as various cases are not expecting to see V16QImode (particularly
the writeback patterns, but also aarch64_gen_load_pair).

gcc/ChangeLog:

PR target/111677
* config/aarch64/aarch64.c (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
(aarch64_gen_storewb_pair): Handle V16QImode.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
* config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
Rename to ...
(loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
(storewb_pair<TX:mode>_<P:mode>): Rename to ...
(storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
* config/aarch64/iterators.md (TX_V16QI): New.

gcc/testsuite/ChangeLog:

PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.

(cherry picked from commit fddce05d67f34174be0f306e1015d3868bbe7c31)

commit | commitdiff | tree

GCC Administrator [Tue, 20 Feb 2024 00:19:39 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 19 Feb 2024 00:19:28 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 18 Feb 2024 00:18:36 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 17 Feb 2024 00:18:31 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 16 Feb 2024 00:19:05 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Jakub Jelinek [Thu, 15 Feb 2024 19:04:01 +0000 (20:04 +0100)]

testsuite: Require lra effective target for pr107385.c

Old reload doesn't support asm goto with output operands.
We have lra effective target (though, strangely it returns
0 just for 2 targets out of at least 16 targets with no LRA support),
so this patch uses it, similarly how it is done in other asm goto
tests with output operands.

2024-02-15 Jakub Jelinek <jakub@redhat.com>

PR middle-end/107385
* gcc.dg/pr107385.c: Require lra effective target.

(cherry picked from commit 0d5d1c75f5c68b6064640c3154ae5f4c0b464905)

commit | commitdiff | tree

Jakub Jelinek [Thu, 15 Feb 2024 14:55:25 +0000 (15:55 +0100)]

testsuite: Add testcase for already fixed PR [PR107385]

This testcase has been fixed by the PR113921 fix, but unlike testcase
in there this one is not target specific.

2024-02-15 Jakub Jelinek <jakub@redhat.com>

PR middle-end/107385
* gcc.dg/pr107385.c: New test.

(cherry picked from commit 5459a9074afabf700f055fc8164f88dadb1c39b0)

commit | commitdiff | tree

Jakub Jelinek [Thu, 15 Feb 2024 14:53:01 +0000 (15:53 +0100)]

expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921]

The Linux kernel and the following testcase distilled from it is
miscompiled, because tree-outof-ssa.cc (eliminate_phi) emits some
fixups on some of the edges (but doesn't commit edge insertions).
Later expand_asm_stmt emits further instructions on the same edge.
Now the problem is that expand_asm_stmt uses insert_insn_on_edge
to add its own fixups, but that function appends to the existing
sequence on the edge if any.  And the bug triggers when the
fixup sequence emitted by eliminate_phi uses a pseudo which the
fixup sequence emitted by expand_asm_stmt later on sets.
So, we end up with
  (set (reg A) (asm_operands ...))
and on one of the edges queued sequence
  (set (reg C) (reg B)) // added by eliminate_phi
  (set (reg B) (reg A)) // added by expand_asm_stmt
That is wrong, what we emit by expand_asm_stmt needs to be as close
to the asm_operands as possible (they aren't known until expand_asm_stmt
is called, the PHI fixup code assumes it is reg B which holds the right
value) and the PHI adjustments need to be done after it.

So, the following patch introduces a prepend_insn_to_edge function and
uses it from expand_asm_stmt, so that we queue
  (set (reg B) (reg A)) // added by expand_asm_stmt
  (set (reg C) (reg B)) // added by eliminate_phi
instead and so the value from the asm_operands output propagates correctly
to the PHI result.

2024-02-15  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/113921
* cfgrtl.h (prepend_insn_to_edge): New declaration.
* cfgrtl.c (insert_insn_on_edge): Clarify behavior in function
comment.
(prepend_insn_to_edge): New function.
* cfgexpand.c (expand_asm_stmt): Use prepend_insn_to_edge instead of
insert_insn_on_edge.

* gcc.target/i386/pr113921.c: New test.

(cherry picked from commit 2b4efc5db2aedb59196987300e14951d08cd7106)

commit | commitdiff | tree

GCC Administrator [Thu, 15 Feb 2024 00:19:24 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 14 Feb 2024 00:19:19 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 13 Feb 2024 00:19:37 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 12 Feb 2024 00:19:12 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Francois-Xavier Coudert [Sun, 11 Feb 2024 13:51:39 +0000 (14:51 +0100)]

libgfortran: avoid duplicate libraries in spec

The linking of libgcc is already present in %(liborig), so the current
situation duplicates libraries. This was not an issue until macOS's new
linker started giving warnings for such cases.

libgfortran/ChangeLog:

PR libfortran/110651
* libgfortran.spec.in: Remove duplicate libraries.

commit | commitdiff | tree

GCC Administrator [Sun, 11 Feb 2024 00:19:20 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 10 Feb 2024 00:19:31 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Harald Anlauf [Sat, 27 Jan 2024 16:41:43 +0000 (17:41 +0100)]

Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]

Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies. However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies. The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.

gcc/fortran/ChangeLog:

PR fortran/104908
* trans-array.c (gfc_conv_array_ref): Restrict use of transformed
descriptor (sym->backend_decl) to the unlimited polymorphic case.

gcc/testsuite/ChangeLog:

PR fortran/104908
* gfortran.dg/pr104908.f90: New test.

(cherry picked from commit ce61de1b8a1bb3a22118e900376f380768f2ba59)

commit | commitdiff | tree

Martin Jambor [Fri, 9 Feb 2024 17:58:43 +0000 (18:58 +0100)]

sra: Disqualify bases of operands of asm gotos

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

gcc/ChangeLog:

2024-01-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/110422
* tree-sra.c (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.

(cherry picked from commit 2b7204c52392c1c0da9c91a5feae0c44018a6f37)

commit | commitdiff | tree

GCC Administrator [Fri, 9 Feb 2024 00:20:32 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 8 Feb 2024 00:19:34 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 7 Feb 2024 00:19:38 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 6 Feb 2024 00:19:50 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Xi Ruoyao [Mon, 5 Feb 2024 12:17:25 +0000 (20:17 +0800)]

mips: Fix missing mode in neg<mode:MSA>2

I was too sleepy writting this :(.

gcc/ChangeLog:

* config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for
neg.

(cherry picked from commit 55357960fbddd261e32f088f5dd328d58b0f25b3)

commit | commitdiff | tree

Xi Ruoyao [Fri, 2 Feb 2024 19:35:07 +0000 (03:35 +0800)]

MIPS: Fix wrong MSA FP vector negation

We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.

Use the bnegi.df instructions to simply reverse the sign bit instead.

gcc/ChangeLog:

* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.

(cherry picked from commit 4d7fe3cf82505b45719356a2e51b1480b5ee21d6)

commit | commitdiff | tree

GCC Administrator [Mon, 5 Feb 2024 00:19:05 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 4 Feb 2024 00:18:29 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 3 Feb 2024 00:19:10 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 2 Feb 2024 00:19:10 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

John David Anglin [Thu, 1 Feb 2024 18:46:47 +0000 (18:46 +0000)]

hppa: Fix bug in atomic_storedi_1 pattern

The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.

2024-02-01 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.

commit | commitdiff | tree

GCC Administrator [Thu, 1 Feb 2024 00:21:06 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 31 Jan 2024 00:21:28 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 30 Jan 2024 00:21:28 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 29 Jan 2024 00:20:54 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 28 Jan 2024 00:19:53 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Lewis Hyatt [Tue, 5 Dec 2023 16:33:39 +0000 (11:33 -0500)]

c-family: Fix ICE with large column number after restoring a PCH [PR105608]

Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.

This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).

This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.c (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.

gcc/testsuite/ChangeLog:

PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.

commit | commitdiff | tree

GCC Administrator [Sat, 27 Jan 2024 00:20:09 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

Wilco Dijkstra [Thu, 25 Jan 2024 16:33:06 +0000 (16:33 +0000)]

AArch64: Add -mcpu=cobalt-100

Add support for -mcpu=cobalt-100 (Neoverse N2 with a different implementer ID).

gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add 'cobalt-100' CPU.
* config/aarch64/aarch64-tune.md: Regenerated.
* doc/invoke.texi (-mcpu): Add cobalt-100 core.

(cherry picked from commit a0d16e1c06e04c11d1eef9705036bad8ac1a11d4)

commit | commitdiff | tree

GCC Administrator [Fri, 26 Jan 2024 00:21:10 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 25 Jan 2024 00:21:20 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

Jason Merrill [Tue, 19 Dec 2023 21:12:02 +0000 (16:12 -0500)]

c++: xvalue array subscript [PR103185]

Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.

PR c++/103185

gcc/cp/ChangeLog:

* typeck.c (cp_build_array_ref): Handle swapped operands.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/array-prvalue2.C: New test.
* g++.dg/cpp1z/eval-order3.C: Test swapped operands.

(cherry picked from commit 8dfc52a75d4d6c8be1c61b4aa831b1812b14a10e)

commit | commitdiff | tree

GCC Administrator [Wed, 24 Jan 2024 00:20:28 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 23 Jan 2024 00:21:14 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 22 Jan 2024 00:19:38 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 21 Jan 2024 00:19:33 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 20 Jan 2024 00:20:10 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 19 Jan 2024 00:19:00 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Jan Hubicka [Thu, 18 Jan 2024 14:37:02 +0000 (15:37 +0100)]

FIx handling of X86_TUNE_AVOID_512FMA_CHAINS

gcc/ChangeLog:

* config/i386/i386-options.c (ix86_option_override_internal):
Fix handling of X86_TUNE_AVOID_512FMA_CHAINS

commit | commitdiff | tree

Jan Hubicka [Thu, 22 Dec 2022 09:55:46 +0000 (10:55 +0100)]

Zen4 tuning part 2

Adds tunes needed for zen4 microarchitecture.  I added two new knobs.
TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors
are split to 256 vectors.  This affects vectorization costs and reassociation
width. It probably should also affect RTX costs however I doubt it is very useful
since RTL optimizers are usually not judging between 256 and 512 vectors.

I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this
flag may not be a win except for very specific benchmarks. I am still doing some
more detailed testing here.

Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them
and since the latencies has only increased since zen3 opencoding is better than
actual instrucction.  This shows at 4 tsvc benchmarks.

I ended up setting AVX256_OPTIMAL. This is a compromise.  There are some tsvc
benchmarks that increase noticeably (up to 250%) however there are also few
regressions.  Most of these can be solved by incrasing vec_perm cost in the
vectorizer.  However this does not cure about 14% regression on x264 that is
quite important.  Here we produce vectorized loops for avx512 that probably
would be faster if the loops in question had high enough iteration count.
We hit this problem with avx256 too: since the loop iterates few times, only
prologues/epilogues are used.  Adding another round of prologue/epilogue
code does not make it better.

Finally I enabled avx stores for constnat sized memcpy and memset.  I am not
sure why this is an opt-in feature.  I think for most hardware this is a win.

gcc/ChangeLog:

2022-12-22  Jan Hubicka  <hubicka@ucw.cz>

* config/i386/i386-expand.c (ix86_expand_set_or_cpymem): Add
TARGET_AVX512_SPLIT_REGS
* config/i386/i386-options.c (ix86_option_override_internal):
Honor x86_TONE_AVOID_256FMA_CHAINS.
* config/i386/i386.c (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS.
(ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable
for znver4.
(X86_TUNE_USE_GATHER_4PARTS): Likewise.
(X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4.
(X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4.
(X86_TUNE_AVX256_OPTIMAL): Add znver4.
(X86_TUNE_AVX512_SPLIT_REGS): New tune.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4.
(X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4.

(cherry picked from commit eef81eefcdc2a58111e50eb2162ea1f5becc8004)

commit | commitdiff | tree

Tejas Joshi [Tue, 8 Nov 2022 18:40:59 +0000 (00:10 +0530)]

Add AMD znver4 instruction reservations

This adds znver4 automata units and reservations separately from other
znver automata, avoiding the insn-automata.cc size blow-up.

gcc/ChangeLog:

* common/config/i386/i386-common.c (processor_alias_table):
Use CPU_ZNVER4 for znver4.
* config/i386/i386.md: Add znver4.md.
* config/i386/znver4.md: New.

(cherry picked from commit 72ce780a497eb3e5efe7a79ea5f21f8dd6858f7f)

commit | commitdiff | tree

Tejas Joshi [Fri, 21 Oct 2022 15:35:39 +0000 (21:05 +0530)]

Remove znver4 instruction reservations

This reverts the changes made to znver.md in:
commit bf3b532b524ecacb3202ab2c8af419ffaaab7cff

2022-10-21 Tejas Joshi <TejasSanjay.Joshi@amd.com>

gcc/ChangeLog:

* common/config/i386/i386-common.c (processor_alias_table): Use
CPU_ZNVER3 for znver4.
* config/i386/znver.md: Remove znver4 reservations.

(cherry picked from commit d93171509aa7ca23148508b96f1c1f70b941d808)

commit | commitdiff | tree

Tejas Joshi [Tue, 28 Jun 2022 11:03:53 +0000 (16:33 +0530)]

Enable AMD znver4 support and add instruction reservations

2022-09-28 Tejas Joshi <TejasSanjay.Joshi@amd.com>

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver4.
* common/config/i386/i386-common.c (processor_names): Add znver4.
(processor_alias_table): Add znver4 and modularize old znvers.
* common/config/i386/i386-cpuinfo.h (processor_subtypes):
AMDFAM19H_ZNVER4.
* config.gcc (x86_64-*-* |...): Likewise.
* config/i386/driver-i386.c (host_detect_local_cpu): Let
-march=native recognize znver4 cpus.
* config/i386/i386-c.c (ix86_target_macros_internal): Add znver4.
* config/i386/i386-options.c (m_ZNVER4): New definition.
(m_ZNVER): Include m_ZNVER4.
(processor_cost_table): Add znver4.
* config/i386/i386.c (ix86_reassociation_width): Likewise.
* config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER4.
(PTA_ZNVER1): New definition.
(PTA_ZNVER2): Likewise.
(PTA_ZNVER3): Likewise.
(PTA_ZNVER4): Likewise.
* config/i386/i386.md (define_attr "cpu"): Add znver4 and rename
md file.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Add znver4.
(ix86_adjust_cost): Likewise.
* config/i386/znver1.md: Rename to znver.md.
* config/i386/znver.md: Add new reservations for znver4.
* doc/extend.texi: Add details about znver4.
* doc/invoke.texi: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: Likewise.

(cherry picked from commit bf3b532b524ecacb3202ab2c8af419ffaaab7cff)

commit | commitdiff | tree

Jan Hubicka [Thu, 22 Dec 2022 01:16:24 +0000 (02:16 +0100)]

Update znver4 costs

Update cost of znver4 mostly based on data measued by Agner Fog.
Compared to previous generations x87 became bit slower which is probably not
big deal (and we have minimal benchmarking coverage for it).  One interesting
improvement is reducation of FMA cost.  I also updated costs of AVX256
loads/stores  based on latencies (not throughput which is twice of avx256).
Overall AVX512 vectorization seems to improve noticeably some of TSVC
benchmarks but since internally 512 vectors are split to 256 vectors it is
somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks
with loop that have small trip count like x264 and exchange), so for now I am
going to set AVX256_OPTIMAL tune but I am still playing with it.  We improved
since ZNVER1 on choosing vectorization size and also have vectorized
prologues/epilogues so it may be possible to make avx512 small win overall.

2022-12-22  Jan Hubicka  <hubicka@ucw.cz>

* config/i386/x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE
moves, division multiplication, gathers, L2 cache size, and more
complex FP instrutions.

(cherry picked from commit bbe04bade0cc3b17e62c2af3d89b899367e7d2d1)

commit | commitdiff | tree

GCC Administrator [Thu, 18 Jan 2024 00:19:15 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 17 Jan 2024 00:23:14 +0000 (00:23 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 16 Jan 2024 00:20:47 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 15 Jan 2024 00:19:25 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 14 Jan 2024 00:20:14 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 13 Jan 2024 00:20:32 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 12 Jan 2024 00:19:07 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Ken Matsui [Thu, 11 Jan 2024 06:08:07 +0000 (22:08 -0800)]

libstdc++: Fix error handling in filesystem::equivalent [PR113250]

This patch made std::filesystem::equivalent correctly throw an exception
when either path does not exist as per [fs.op.equivalent]/4.

PR libstdc++/113250

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
* src/filesystem/ops.cc (fs::equivalent): Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
error codes.
* testsuite/experimental/filesystem/operations/equivalent.cc:
Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit df147e2ee7199d33d66959c6509ce9c21072077f)

commit | commitdiff | tree

GCC Administrator [Thu, 11 Jan 2024 00:20:27 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 10 Jan 2024 00:19:15 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 9 Jan 2024 00:18:42 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 8 Jan 2024 00:18:23 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 7 Jan 2024 00:18:22 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 6 Jan 2024 00:18:54 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 5 Jan 2024 00:19:41 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 4 Jan 2024 00:21:03 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

Patrick Palka [Wed, 3 Jan 2024 02:31:20 +0000 (21:31 -0500)]

libstdc++: testsuite: Reduce max_size_type.cc exec time [PR113175]

The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
inadvertently increased the execution time of this test by over 5x due
to making the two main loops actually run in the signed_p case instead
of being dead code.

To compensate, this patch cuts the relevant loops' range [-1000,1000] by
10x as proposed in the PR. This shouldn't significantly weaken the test
since the same important edge cases are still checked in the smaller range
and/or elsewhere. On my machine this reduces the test's execution time by
roughly 10x (and 1.6x relative to before r14-205).

PR testsuite/113175

libstdc++-v3/ChangeLog:

* testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
(test03): Likewise.

(cherry picked from commit a138b99646a5551c53b860648521adb5bfe8c2fa)

commit | commitdiff | tree

GCC Administrator [Wed, 3 Jan 2024 00:20:36 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 2 Jan 2024 00:21:37 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 1 Jan 2024 00:20:58 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sun, 31 Dec 2023 00:18:58 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 30 Dec 2023 00:20:09 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 29 Dec 2023 00:19:56 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 28 Dec 2023 00:21:22 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Wed, 27 Dec 2023 00:20:28 +0000 (00:20 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Tue, 26 Dec 2023 00:21:43 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Mon, 25 Dec 2023 00:21:13 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

Patrick Palka [Fri, 22 Sep 2023 10:25:49 +0000 (06:25 -0400)]

c++: constraint rewriting during ttp coercion [PR111485]

In order to compare the constraints of a ttp with that of its argument,
we rewrite the ttp's constraints in terms of the argument template's
template parameters. The substitution to achieve this currently uses a
single level of template arguments, but that never does the right thing
because a ttp's template parameters always have level >= 2. This patch
fixes this by including the outer template arguments in the substitution,
which ought to match the depth of the ttp.

The second testcase demonstrates it's better to substitute the concrete
outer template arguments instead of generic ones since a ttp's constraints
could depend on outer parameters.

PR c++/111485

gcc/cp/ChangeLog:

* pt.c (is_compatible_template_arg): New parameter 'args'.
Add the outer template arguments 'args' to 'new_args'.
(convert_template_argument): Pass 'args' to
is_compatible_template_arg.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp5.C: New test.
* g++.dg/cpp2a/concepts-ttp6.C: New test.

(cherry picked from commit 6f902a42b0afe3f3145bcb864695fc290b5acc3e)

commit | commitdiff | tree

GCC Administrator [Sun, 24 Dec 2023 00:21:22 +0000 (00:21 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Sat, 23 Dec 2023 00:18:55 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Fri, 22 Dec 2023 00:19:32 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

GCC Administrator [Thu, 21 Dec 2023 00:19:44 +0000 (00:19 +0000)]

Daily bump.

commit | commitdiff | tree

Jason Merrill [Fri, 23 Apr 2021 20:41:35 +0000 (16:41 -0400)]

c++: -Wdeprecated-copy and using operator= [PR92145]

For the purpose of [depr.impldec] "if the class has a user-declared copy
assignment operator", an operator= brought in from a base class with 'using'
may be a copy-assignment operator, but it isn't a copy-assignment operator
for the derived class.

gcc/cp/ChangeLog:

PR c++/92145
* class.c (classtype_has_depr_implicit_copy): Check DECL_CONTEXT
of operator=.

gcc/testsuite/ChangeLog:

PR c++/92145
* g++.dg/cpp0x/depr-copy3.C: New test.

(cherry picked from commit 37846c42f1f5ac4d9ba190d49c4373673c89c8b5)

commit | commitdiff | tree

Jason Merrill [Sun, 4 Jun 2023 16:00:55 +0000 (12:00 -0400)]

c++: NRV and goto [PR92407]

Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. A simple fix for the
release branches is to disable the optimization in the presence of backward
goto.

We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't track that, and in GCC 14
we instead make the goto work with NRV.

PR c++/92407

gcc/cp/ChangeLog:

* cp-tree.h (struct language_function): Add backward_goto.
* decl.c (check_goto): Set it.
* typeck.c (check_return_expr): Prevent NRV if set.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv22.C: New test.

(cherry picked from commit a645347c19b07cc7abd7bf276c6769fc41afc932)

commit | commitdiff | tree

Patrick Palka [Tue, 25 Apr 2023 19:59:22 +0000 (15:59 -0400)]

c++: value dependence of by-ref lambda capture [PR108975]

We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses.  Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.

We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.

To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time.  It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P.  This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.

PR c++/108975

gcc/cp/ChangeLog:

* pt.c (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.

(cherry picked from commit 3d674e29d7f89bf93fcfcc963ff0248c6347586d)

Mirror of https://gcc.gnu.org/git/gcc.git

RSS Atom