https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64072
Bug ID: 64072 Summary: wrong cgraph node profile count Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: davidxl at gcc dot gnu.org, hubicka at gcc dot gnu.org We have a program like this: A() { // hot func ... } B() { A(); // very hot if (i) { A(); // very cold } } Both callsites of A will be inlined into B. In gcc func save_inline_function_body in inline_transform stage, A's first clone will be choosen and materialized. For our case, the clone node choosen corresponds to the cold callsite of A. cgraph_rebuild_references in tree_function_versioning will reset the cgraph node count of the choosen clone to the entry bb count of func A (A is hot). So the cgraph node count of the choosen clone becomes hot while its inline edge count is still cold. It breaks the assumption described here: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html: for inline node, bb->count == edge->count == edge->callee->count For the patch committed in the thread above (it is listed below), cg_edge->callee->count is used for profile update to its inline instance, which leads to a hot BB in func B which is actually very cold. The wrong profile information causes performance regression in one of our internal benchmarks. Our internal workround is to change cg_edge->callee->count to MIN(cg_edge->callee->count, cg_edge->count). Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c (revision 210535) +++ gcc/tree-inline.c (working copy) @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c function in any way before this point, as this CALL_EXPR may be a self-referential call; if we're calling ourselves, we need to duplicate our body before altering anything. */ - copy_body (id, bb->count, + copy_body (id, cg_edge->callee->count, GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), bb, return_block, NULL);