We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.
Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:
profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);
Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in
if (scale)
bb->count = bb->count.apply_scale (num, den);
This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.
The fix is to update the call graph node count once we've done count propagation.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
PR gcov-profile/116743
* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
and the entry block count.
(cherry picked from commit
e683c6b029f809c7a1981b4341c95d9652c22e18)
if (s == NULL)
return;
- cgraph_node::get (current_function_decl)->count
- = profile_count::from_gcov_type (s->head_count ()).afdo ();
ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
= profile_count::from_gcov_type (s->head_count ()).afdo ();
EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo ();
/* Calculate, propagate count and probability information on CFG. */
afdo_calculate_branch_prob (&annotated_bb);
}
+ cgraph_node::get(current_function_decl)->count
+ = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
update_max_bb_count ();
profile_status_for_fn (cfun) = PROFILE_READ;
if (flag_value_profile_transformations)