Eugene Rozenfeld [Sat, 11 Jan 2025 03:48:52 +0000 (19:48 -0800)]
Fix setting of call graph node AutoFDO count
We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.
Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:
profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);
Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in
if (scale)
bb->count = bb->count.apply_scale (num, den);
This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests). 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.
The fix is to update the call graph node count once we've done count propagation.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
PR gcov-profile/116743
* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
and the entry block count.
Nathaniel Shead [Fri, 20 Dec 2024 11:09:39 +0000 (22:09 +1100)]
c++: Allow pragmas in NSDMIs [PR118147]
This patch removes the (unnecessary) CPP_PRAGMA_EOL case from
cp_parser_cache_defarg, which currently has the result that any pragmas
in the NSDMI cause an error.
PR c++/118147
gcc/cp/ChangeLog:
* parser.cc (cp_parser_cache_defarg): Don't error when
CPP_PRAGMA_EOL.
Richard Biener [Tue, 12 Nov 2024 10:15:15 +0000 (11:15 +0100)]
tree-optimization/117417 - ICE with complex load optimization
When we decompose a complex load only used as real and imaginary
parts we fail to honor IL constraints which are that a BIT_FIELD_REF
of register type should be outermost in a ref. The following
simply avoids the transform when the complex load has such a
BIT_FIELD_REF.
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all
group members and when the group is later split due to duplicates
not all sub-groups inherit the flag.
PR tree-optimization/117307
* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
Properly compute STMT_VINFO_SLP_VECT_ONLY. Set it on all
parts of a split group.
Richard Biener [Sat, 12 Oct 2024 12:51:37 +0000 (14:51 +0200)]
tree-optimization/117104 - add missed guards to max(a,b) != a simplification
For vector types we have to make sure the comparison result is a vector
type and the resulting compare operation is supported. As the resulting
compare is never an equality compare I didn't bother to check for the
cbranch case.
Jakub Jelinek [Tue, 15 Oct 2024 17:38:46 +0000 (19:38 +0200)]
match.pd: Further fma negation fixes [PR116891]
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> > PR middle-end/116891
> > * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> > Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.
I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.
Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.
The following reverts a bogus fix done for PR101009 and instead makes
sure we get into the same_access_functions () case when computing
the distance vector for g[1] and g[1] where the constants ended up
having different types. The generic code doesn't seem to handle
loop invariant dependences. The special case gets us both
( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ),
which the PR101009 fix changed to ( 0 ) with bad effects on other
cases as shown in this PR.
PR tree-optimization/116768
* tree-data-ref.cc (build_classic_dist_vector_1): Revert
PR101009 change.
* tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1
and (int)1 compare equal.
Richard Biener [Sun, 13 Oct 2024 13:12:44 +0000 (15:12 +0200)]
tree-optimization/116290 - fix compare-debug issue in ldist
Loop distribution does different analysis with -g0/-g due to counting
a debug stmt starting a BB against a limit which will everntually
lead to different IVOPTs choices. I've fixed a possible IVOPTs
issue on the way even though it doesn't make a difference here.
PR tree-optimization/116290
* tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs
have no debug variants. Start with first non-debug real stmt.
* tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze
debug stmts.
Richard Biener [Mon, 9 Jan 2023 11:46:28 +0000 (12:46 +0100)]
middle-end/69482 - not preserving volatile accesses
The following addresses a long standing issue with not preserving
accesses to non-volatile objects through volatile qualified
pointers in the case that object gets expanded to a register. The
fix is to treat accesses to an object with a volatile qualified
access as forcing that object to memory. This issue got more
exposed recently so it regressed more since GCC 11.
PR middle-end/69482
* cfgexpand.cc (discover_nonconstant_array_refs_r): Volatile
qualified accesses also force objects to memory.
* gcc.target/i386/pr69482-1.c: New testcase.
* gcc.target/i386/pr69482-2.c: Likewise.
Jan Hubicka [Wed, 4 Sep 2024 07:19:08 +0000 (09:19 +0200)]
Zen5 tuning part 5: update instruction latencies in x86-tune-costs
there is nothing exciting in this patch. I measured latencies and also compared
them with newly released optimization guide. There are no dramatic changes
compared to zen4. One interesting new bit is that addss is faster and can be
2 cycles when fed by another addss.
I also increased the large insn bound since decoders seems no longer require
instructions to be 8 bytes or less.
Richard Biener [Wed, 24 Jul 2024 11:16:35 +0000 (13:16 +0200)]
tree-optimization/116057 - wrong code with CCP and vector CTORs
The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants. This should
be classified as CONSTANT and not UNDEFINED.
PR tree-optimization/116057
* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
operands to look for constants.
Richard Biener [Thu, 27 Jun 2024 09:26:08 +0000 (11:26 +0200)]
tree-optimization/115669 - fix SLP reduction association
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.
When we achieved SLP only we can move and update this meta-data.
PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.
Jan Hubicka [Tue, 3 Sep 2024 16:20:34 +0000 (18:20 +0200)]
Zen5 tuning part 4: update reassocation width
Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in
3 of them. FP units can do 2 additions and 2 multiplications with latency 2
and 3. This patch updates reassociation width accordingly. This has potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.
Jan Hubicka [Tue, 3 Sep 2024 14:26:16 +0000 (16:26 +0200)]
Zen5 tuning part 3: scheduler tweaks
this patch adds support for new fussion in znver5 documented in the
optimization manual:
The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
with certain ALU instructions. The following conditions need to be met for
fusion to happen:
- The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
- The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
- The ALU instruction may source only registers or immediate data. There cannot be any memory source.
- The ALU instruction sources either the source or dest of MOV instruction.
- If ALU instruction has 2 reg sources, they should be different.
- The following ALU instructions can fuse with an older qualified MOV instruction:
ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
(I assume OP is OR)
I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.
Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).
New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
.cfi_offset 3, -32
leaq 63(%rsi), %rbx
movq %rbx, %rbp
+ shrq $6, %rbp
+ salq $3, %rbp
subq $16, %rsp
.cfi_def_cfa_offset 48
movq %rdi, %r12
- shrq $6, %rbp
- movq %rsi, 8(%rsp)
- salq $3, %rbp
movq %rbp, %rdi
+ movq %rsi, 8(%rsp)
call _Znwm
movq 8(%rsp), %rsi
movl $0, 8(%r12)
@@ -2224,8 +2224,8 @@
movq %rax, (%r12)
movq %rbp, 32(%r12)
testq %rsi, %rsi
- movq %rsi, %rdx
cmovns %rsi, %rbx
+ movq %rsi, %rdx
sarq $63, %rdx
shrq $58, %rdx
sarq $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.
gcc/ChangeLog:
* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
(ix86_adjust_cost): Add TODO about znver5 memory latency.
(ix86_fuse_mov_alu_p): New.
(ix86_macro_fusion_pair_p): Use it.
* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
(X86_TUNE_FUSE_MOV_AND_ALU): New tune;
Eric Botcazou [Wed, 6 Sep 2023 07:37:29 +0000 (09:37 +0200)]
ada: Fix internal error on aggregate nested in container aggregate
This handles the case where a component association is present.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): In the case of a
component association, call Is_Container_Aggregate on the parent's
parent.
(Expand_Array_Aggregate): Likewise.
Eric Botcazou [Thu, 25 May 2023 22:09:14 +0000 (00:09 +0200)]
ada: Fix internal error on aggregate within container aggregate
This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
(Expand_Array_Aggregate): Do not delay the expansion if the parent
node is a container aggregate.
Marc Poulhiès [Mon, 27 Mar 2023 14:47:04 +0000 (16:47 +0200)]
ada: Fix crash on vector initialization
Initializing a vector using
Vec : V.Vector := [Some_Type'(Some_Abstract_Type with F => 0)];
may crash the compiler. The expander marks the N_Extension_Aggregate for
delayed expansion which never happens and incorrectly ends up in gigi.
The delayed expansion is needed for nested aggregates, which the
original code is testing for, but container aggregates are handled
differently.
Such assignments to container aggregates are later transformed into
procedure calls to the procedures named in the Aggregate aspect
definition, for which the delayed expansion is not required/expected.
gcc/ada/
PR ada/118234
* exp_aggr.adb (Convert_To_Assignments): Do not mark node for
delayed expansion if parent type has the Aggregate aspect.
* sem_util.adb (Is_Container_Aggregate): Move...
* sem_util.ads (Is_Container_Aggregate): ... here and make it
public.
Eric Botcazou [Thu, 12 Dec 2024 15:25:09 +0000 (16:25 +0100)]
Fix precondition failure with Ada.Numerics.Generic_Real_Arrays.Eigenvalues
This fixes a precondition failure triggered when the Eigenvalues routine
of Ada.Numerics.Generic_Real_Arrays is instantiated with -gnata, beause
it calls Sort_Eigensystem on an empty vector.
gcc/ada
PR ada/117996
* libgnat/a-ngrear.adb (Jacobi): Remove default value for
Compute_Vectors formal parameter.
(Sort_Eigensystem): Add Compute_Vectors formal parameter. Do not
modify the Vectors if Compute_Vectors is False.
(Eigensystem): Pass True as Compute_Vectors to Sort_Eigensystem.
(Eigenvalues): Pass False as Compute_Vectors to Sort_Eigensystem.
AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP. This is needed because SP
my be stored in some frame location.
gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().
Simon Martin [Tue, 3 Dec 2024 13:30:43 +0000 (14:30 +0100)]
c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
We currently reject the following valid code:
=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast<fn_t>(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===
The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.
This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.
PR c++/117615
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.
Andre Vieira [Mon, 2 Dec 2024 13:35:03 +0000 (13:35 +0000)]
arm, mve: Adding missing Runtime Library Exception to header files
Add missing Runtime Library Exception to mve header files to bring them into
line with other similar headers. Not adding it in the first place was an
oversight.
Paul Thomas [Wed, 13 Nov 2024 08:57:55 +0000 (08:57 +0000)]
Fortran: Fix failing character pointer fcn assignment [PR105054]
2024-11-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/105054
* resolve.cc (get_temp_from_expr): If the pointer function has
a deferred character length, generate a new deferred charlen
for the temporary.
gcc/testsuite/
PR fortran/105054
* gfortran.dg/ptr_func_assign_6.f08: New test.
Martin Jambor [Fri, 15 Nov 2024 13:37:06 +0000 (14:37 +0100)]
tree-sra: Avoid SRAing arguments to a function returning_twice (PR 117142)
This is a manual bacport of commit 29d8f1f0b7ad3c69b3bdb130325300d5f73aa784 which must be done slightly
elsewhere for gcc 13 and 12 because function
build_access_from_call_arg was added only in gcc 14.
But the gist of the patch is the same. The commit message of the
original fix says:
PR 117142 shows that the current SRA probably never worked reliably
with arguments passed to a function returning twice, because it then
creates statements before the call which however needs to be at the
beginning of a basic block.
While it should be possible to make at least the case of passing
arguments by value work with SRA (the statements would need to be put
just on the non-abnormal edges leading to the BB), this would mean
large surgery of function sra_modify_expr and I guess the time would
better be spent re-organizing the whole pass.
gcc/ChangeLog:
2024-11-14 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117142
* tree-sra.cc (scan_function): Disqualify any candidate passed to
a function returning twice.
Paul Thomas [Tue, 26 Nov 2024 08:58:21 +0000 (08:58 +0000)]
Fortran: Partial reversion of r15-5083 [PR117763]
2024-11-26 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/117763
* trans-array.cc (gfc_get_array_span): Guard against derefences
of 'expr'. Clean up some typos. Use 'gfc_get_vptr_from_expr'
for clarity and apply a functional reversion of last section
that deals with class dummies.
gcc/testsuite/
PR fortran/117763
* gfortran.dg/pr117763.f90: New test.
Arsen Arsenović [Thu, 15 Aug 2024 17:17:41 +0000 (19:17 +0200)]
gnat: fix lto-type-mismatch between C_Version_String and gnat_version_string [PR115917]
gcc/ada/ChangeLog:
PR ada/115917
* gnatvsn.ads: Add note about the duplication of this value in
version.c.
* version.c (VER_LEN_MAX): Define to the same value as
Gnatvsn.Ver_Len_Max.
(gnat_version_string): Use VER_LEN_MAX as bound.
Paul Thomas [Mon, 11 Nov 2024 12:21:57 +0000 (12:21 +0000)]
Fortran: Fix elemental array refs in SELECT TYPE [PR109345]
2024-11-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/109345
* trans-array.cc (gfc_get_array_span): Unlimited polymorphic
expressions are now treated separately since the span need not
be the same as the element size.
Georg-Johann Lay [Sat, 23 Nov 2024 11:51:32 +0000 (12:51 +0100)]
AVR: target/117744 - Fix asm for partial clobber of address reg,
gcc/
PR target/117744
* config/avr/avr.cc (out_movqi_r_mr): Fix code when a load
only partially clobbers an address register due to
changing the address register temporally to accommodate for
faked addressing modes.
Uros Bizjak [Mon, 18 Nov 2024 21:38:46 +0000 (22:38 +0100)]
i386: Enable *rsqrtsf2_sse without TARGET_SSE_MATH [PR117357]
__builtin_ia32_rsqrtsf2 expander generates UNSPEC_RSQRT insn pattern
also when TARGET_SSE_MATH is not set. Enable *rsqrtsf2_sse without
TARGET_SSE_MATH to avoid ICE with unrecognizable insn.
PR target/117357
gcc/ChangeLog:
* config/i386/i386.md (*rsqrtsf2_sse):
Also enable for !TARGET_SSE_MATH.