On Sun, Apr 17, 2011 at 8:29 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> Hi please review the attached patch. >> >> Ok when bootstrap and test finish? >> >> Thanks, >> >> David >> >> >> >> 2011-04-07 Xinliang David Li <davi...@google.com> >> >> * ipa-cp.c (ipcp_update_profiling): Correct >> negative scale factor due to insane profile data. > >> Index: ipa-cp.c >> =================================================================== >> --- ipa-cp.c (revision 171917) >> +++ ipa-cp.c (working copy) >> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void) >> scale = ipcp_get_node_scale (orig_node); >> node->count = orig_node->count * scale / REG_BR_PROB_BASE; >> scale_complement = REG_BR_PROB_BASE - scale; >> + >> + /* Negative scale complement can result from insane profile data >> + in which the total incoming edge counts in this module is >> + larger than the callee's entry count. The insane profile data >> + usually gets generated due to the following reasons: >> + >> + 1) in multithreaded programs, when profile data is dumped >> + to gcda files in gcov_exit, some other threads are still >> running. >> + The profile counters are dumped in bottom up order (call >> graph). >> + The caller's BB counters may still be updated while the >> callee's >> + counter data is already saved to disk. >> + >> + 2) Comdat functions: comdat functions' profile data are not >> + allocated in comdat. When a comdat callee function gets inlined >> + at some callsites after instrumentation, and the remaining >> calls >> + to this function resolves to a comdat copy in another module, >> + the profile counters for this function are split. This can >> + result in sum of incoming edge counts from this module being >> + larger than callee instance's entry count. */ >> + >> + if (scale_complement < 0 && flag_profile_correction) > > Please don't use flag_profile_correction in code like this. Even with > !flag_profile_correction > the profile is sane only just at the time it is read. Compiler invalidate it > for various non-trivial > reasons during the code transformation. Think if inlining > int t(int a) > { > if (a<0) > return 1; > else return 2; > } > that is called once with constant -1 and once with constant 0. Profile say > that the branch has probability 1/2 but in one inline copy it has probablity 0 > and in other copy 1. We simply inline with probability 1/2 and end up with > insane profile in the caller.
Yes -- this is the insanity due to profile update without context sensitivity -- different from the scenario that motivates the change -- which is insanity caused by profile collection. Dropping the flag check is fine. > > I would suggest, for brevity, also drop the comment on why profile can be > insane. We have > many places like this, even if it might look bit distracking at start, we > probably could > document that somewhere in the internal documentation. Can't think of better > place to stick > this. > > However the test seems bit symptomatic to me. It matches always when scale > > REG_BR_PROB_BASE > and such scale is already nonsential (i.e. clone should not be more often > executed than its > original). It is computed in: > > /* Compute the proper scale for NODE. It is the ratio between the number of > direct calls (represented on the incoming cgraph_edges) and sum of all > invocations of NODE (represented as count in cgraph_node). > > FIXME: This code is wrong. Since the callers can be also clones and > the clones are not scaled yet, the sums gets unrealistically high. > To properly compute the counts, we would need to do propagation across > callgraph (as external call to A might imply call to non-cloned B > if A's clone calls cloned B). */ > static void > ipcp_compute_node_scale (struct cgraph_node *node) > { > gcov_type sum; > struct cgraph_edge *cs; > > sum = 0; > /* Compute sum of all counts of callers. */ > for (cs = node->callers; cs != NULL; cs = cs->next_caller) > sum += cs->count; > /* Work around the unrealistically high sum problem. We just don't want > the non-cloned body to have negative or very low frequency. Since > majority of execution time will be spent in clones anyway, this should > give good enough profile. */ > if (sum > node->count * 9 / 10) > sum = node->count * 9 / 10; > if (node->count == 0) > ipcp_set_node_scale (node, 0); > else > ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count); > } > > We already test that sum is at most 9/10th of node count and then the scale is > computed as sum * REG_BR_PROB_BASE / node->count so the scale should not > exceed REG_BR_PROB_BASE * 9 / 10. > > How it possibly can end up being greater than REG_BR_PROB_BASE at the time it > is used? Good point. The change above was added based on 4.4.3 where the sum computation has no capping like above. Yes, with the current capping, the negative value won't result. However, it is still good to guard the scale independent of the implementation of ipcp_compute_node_scale -- it may change and break silently (the comment of the function says the implementation is wrong...) Thanks, David > > Honza >