https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64072

            Bug ID: 64072
           Summary: wrong cgraph node profile count
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wmi at google dot com
                CC: davidxl at gcc dot gnu.org, hubicka at gcc dot gnu.org

We have a program like this:

A() {    // hot func
  ...
}

B() {
  A();    // very hot
  if (i) {
    A();  // very cold
  }
}

Both callsites of A will be inlined into B. In gcc func
save_inline_function_body in inline_transform stage, A's first clone
will be choosen and materialized. For our case, the clone
node choosen corresponds to the cold callsite of A.
cgraph_rebuild_references in tree_function_versioning will reset the
cgraph node count of the choosen clone to the entry bb count of func A
(A is hot). So the cgraph node count of the choosen clone becomes hot
while its inline edge count is still cold. It breaks the assumption
described here:
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html:
for inline node, bb->count == edge->count == edge->callee->count

For the patch committed in the thread above (it is listed below),
cg_edge->callee->count is used for profile update to its inline
instance, which leads to a hot BB in func B which is actually very
cold. The wrong profile information causes performance regression in
one of our internal benchmarks. Our internal workround is to change
cg_edge->callee->count to MIN(cg_edge->callee->count, cg_edge->count).

Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c (revision 210535)
+++ gcc/tree-inline.c (working copy)
@@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c
      function in any way before this point, as this CALL_EXPR may be
      a self-referential call; if we're calling ourselves, we need to
      duplicate our body before altering anything.  */
-  copy_body (id, bb->count,
+  copy_body (id, cg_edge->callee->count,
        GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE),
      bb, return_block, NULL);

Reply via email to