Jan Hubicka [Wed, 4 Sep 2024 07:19:08 +0000 (09:19 +0200)]
Zen5 tuning part 5: update instruction latencies in x86-tune-costs
there is nothing exciting in this patch. I measured latencies and also compared
them with newly released optimization guide. There are no dramatic changes
compared to zen4. One interesting new bit is that addss is faster and can be
2 cycles when fed by another addss.
I also increased the large insn bound since decoders seems no longer require
instructions to be 8 bytes or less.
Richard Biener [Wed, 24 Jul 2024 11:16:35 +0000 (13:16 +0200)]
tree-optimization/116057 - wrong code with CCP and vector CTORs
The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants. This should
be classified as CONSTANT and not UNDEFINED.
PR tree-optimization/116057
* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
operands to look for constants.
Richard Biener [Thu, 27 Jun 2024 09:26:08 +0000 (11:26 +0200)]
tree-optimization/115669 - fix SLP reduction association
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.
When we achieved SLP only we can move and update this meta-data.
PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.
Jan Hubicka [Tue, 3 Sep 2024 16:20:34 +0000 (18:20 +0200)]
Zen5 tuning part 4: update reassocation width
Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in
3 of them. FP units can do 2 additions and 2 multiplications with latency 2
and 3. This patch updates reassociation width accordingly. This has potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.
Jan Hubicka [Tue, 3 Sep 2024 14:26:16 +0000 (16:26 +0200)]
Zen5 tuning part 3: scheduler tweaks
this patch adds support for new fussion in znver5 documented in the
optimization manual:
The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
with certain ALU instructions. The following conditions need to be met for
fusion to happen:
- The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
- The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
- The ALU instruction may source only registers or immediate data. There cannot be any memory source.
- The ALU instruction sources either the source or dest of MOV instruction.
- If ALU instruction has 2 reg sources, they should be different.
- The following ALU instructions can fuse with an older qualified MOV instruction:
ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
(I assume OP is OR)
I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.
Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).
New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
.cfi_offset 3, -32
leaq 63(%rsi), %rbx
movq %rbx, %rbp
+ shrq $6, %rbp
+ salq $3, %rbp
subq $16, %rsp
.cfi_def_cfa_offset 48
movq %rdi, %r12
- shrq $6, %rbp
- movq %rsi, 8(%rsp)
- salq $3, %rbp
movq %rbp, %rdi
+ movq %rsi, 8(%rsp)
call _Znwm
movq 8(%rsp), %rsi
movl $0, 8(%r12)
@@ -2224,8 +2224,8 @@
movq %rax, (%r12)
movq %rbp, 32(%r12)
testq %rsi, %rsi
- movq %rsi, %rdx
cmovns %rsi, %rbx
+ movq %rsi, %rdx
sarq $63, %rdx
shrq $58, %rdx
sarq $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.
gcc/ChangeLog:
* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
(ix86_adjust_cost): Add TODO about znver5 memory latency.
(ix86_fuse_mov_alu_p): New.
(ix86_macro_fusion_pair_p): Use it.
* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
(X86_TUNE_FUSE_MOV_AND_ALU): New tune;
Eric Botcazou [Wed, 6 Sep 2023 07:37:29 +0000 (09:37 +0200)]
ada: Fix internal error on aggregate nested in container aggregate
This handles the case where a component association is present.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): In the case of a
component association, call Is_Container_Aggregate on the parent's
parent.
(Expand_Array_Aggregate): Likewise.
Eric Botcazou [Thu, 25 May 2023 22:09:14 +0000 (00:09 +0200)]
ada: Fix internal error on aggregate within container aggregate
This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
(Expand_Array_Aggregate): Do not delay the expansion if the parent
node is a container aggregate.
Marc Poulhiès [Mon, 27 Mar 2023 14:47:04 +0000 (16:47 +0200)]
ada: Fix crash on vector initialization
Initializing a vector using
Vec : V.Vector := [Some_Type'(Some_Abstract_Type with F => 0)];
may crash the compiler. The expander marks the N_Extension_Aggregate for
delayed expansion which never happens and incorrectly ends up in gigi.
The delayed expansion is needed for nested aggregates, which the
original code is testing for, but container aggregates are handled
differently.
Such assignments to container aggregates are later transformed into
procedure calls to the procedures named in the Aggregate aspect
definition, for which the delayed expansion is not required/expected.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): Do not mark node for
delayed expansion if parent type has the Aggregate aspect.
* sem_util.adb (Is_Container_Aggregate): Move...
* sem_util.ads (Is_Container_Aggregate): ... here and make it
public.
Eric Botcazou [Thu, 12 Dec 2024 15:25:09 +0000 (16:25 +0100)]
Fix precondition failure with Ada.Numerics.Generic_Real_Arrays.Eigenvalues
This fixes a precondition failure triggered when the Eigenvalues routine
of Ada.Numerics.Generic_Real_Arrays is instantiated with -gnata, beause
it calls Sort_Eigensystem on an empty vector.
gcc/ada
PR ada/117996
* libgnat/a-ngrear.adb (Jacobi): Remove default value for
Compute_Vectors formal parameter.
(Sort_Eigensystem): Add Compute_Vectors formal parameter. Do not
modify the Vectors if Compute_Vectors is False.
(Eigensystem): Pass True as Compute_Vectors to Sort_Eigensystem.
(Eigenvalues): Pass False as Compute_Vectors to Sort_Eigensystem.
AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP. This is needed because SP
my be stored in some frame location.
gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().
Simon Martin [Tue, 3 Dec 2024 13:30:43 +0000 (14:30 +0100)]
c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
We currently reject the following valid code:
=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast<fn_t>(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===
The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.
This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.
PR c++/117615
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.
Andre Vieira [Mon, 2 Dec 2024 13:35:03 +0000 (13:35 +0000)]
arm, mve: Adding missing Runtime Library Exception to header files
Add missing Runtime Library Exception to mve header files to bring them into
line with other similar headers. Not adding it in the first place was an
oversight.
Paul Thomas [Wed, 13 Nov 2024 08:57:55 +0000 (08:57 +0000)]
Fortran: Fix failing character pointer fcn assignment [PR105054]
2024-11-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/105054
* resolve.cc (get_temp_from_expr): If the pointer function has
a deferred character length, generate a new deferred charlen
for the temporary.
gcc/testsuite/
PR fortran/105054
* gfortran.dg/ptr_func_assign_6.f08: New test.
Martin Jambor [Fri, 15 Nov 2024 13:37:06 +0000 (14:37 +0100)]
tree-sra: Avoid SRAing arguments to a function returning_twice (PR 117142)
This is a manual bacport of commit 29d8f1f0b7ad3c69b3bdb130325300d5f73aa784 which must be done slightly
elsewhere for gcc 13 and 12 because function
build_access_from_call_arg was added only in gcc 14.
But the gist of the patch is the same. The commit message of the
original fix says:
PR 117142 shows that the current SRA probably never worked reliably
with arguments passed to a function returning twice, because it then
creates statements before the call which however needs to be at the
beginning of a basic block.
While it should be possible to make at least the case of passing
arguments by value work with SRA (the statements would need to be put
just on the non-abnormal edges leading to the BB), this would mean
large surgery of function sra_modify_expr and I guess the time would
better be spent re-organizing the whole pass.
gcc/ChangeLog:
2024-11-14 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117142
* tree-sra.cc (scan_function): Disqualify any candidate passed to
a function returning twice.
Paul Thomas [Tue, 26 Nov 2024 08:58:21 +0000 (08:58 +0000)]
Fortran: Partial reversion of r15-5083 [PR117763]
2024-11-26 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/117763
* trans-array.cc (gfc_get_array_span): Guard against derefences
of 'expr'. Clean up some typos. Use 'gfc_get_vptr_from_expr'
for clarity and apply a functional reversion of last section
that deals with class dummies.
gcc/testsuite/
PR fortran/117763
* gfortran.dg/pr117763.f90: New test.
Arsen Arsenović [Thu, 15 Aug 2024 17:17:41 +0000 (19:17 +0200)]
gnat: fix lto-type-mismatch between C_Version_String and gnat_version_string [PR115917]
gcc/ada/ChangeLog:
PR ada/115917
* gnatvsn.ads: Add note about the duplication of this value in
version.c.
* version.c (VER_LEN_MAX): Define to the same value as
Gnatvsn.Ver_Len_Max.
(gnat_version_string): Use VER_LEN_MAX as bound.
Paul Thomas [Mon, 11 Nov 2024 12:21:57 +0000 (12:21 +0000)]
Fortran: Fix elemental array refs in SELECT TYPE [PR109345]
2024-11-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/109345
* trans-array.cc (gfc_get_array_span): Unlimited polymorphic
expressions are now treated separately since the span need not
be the same as the element size.
Georg-Johann Lay [Sat, 23 Nov 2024 11:51:32 +0000 (12:51 +0100)]
AVR: target/117744 - Fix asm for partial clobber of address reg,
gcc/
PR target/117744
* config/avr/avr.cc (out_movqi_r_mr): Fix code when a load
only partially clobbers an address register due to
changing the address register temporally to accommodate for
faked addressing modes.
Uros Bizjak [Mon, 18 Nov 2024 21:38:46 +0000 (22:38 +0100)]
i386: Enable *rsqrtsf2_sse without TARGET_SSE_MATH [PR117357]
__builtin_ia32_rsqrtsf2 expander generates UNSPEC_RSQRT insn pattern
also when TARGET_SSE_MATH is not set. Enable *rsqrtsf2_sse without
TARGET_SSE_MATH to avoid ICE with unrecognizable insn.
PR target/117357
gcc/ChangeLog:
* config/i386/i386.md (*rsqrtsf2_sse):
Also enable for !TARGET_SSE_MATH.