> > The esitmated profile is already there before reading the profile in, so we > > only do not want to overwrite it. Does the following work for you? > > It does get the estimated frequencies on the bbs.
Good. > > We wil also need to solve problem that in this case cgraph edges will have > > 0 profile. > > We probably want to play the game there and just do the scaling for edge > > count, > > since IPA passes probably do not want to care about partial profiles. > > The problem I am having is that my new freqs_to_counts routine takes > the sum of the incoming edge counts and computes the bb counts by > scaling by the estimated bb frequency. But in the case where the > caller was also missing profile data the edge count will have 0 > profile as you note above, so the counts remain 0. I am not quite sure > how to handle this. Instead of using the sum of the incoming profile > counts, I could just pick a default profile count to use as the entry > bb count. The synthesized counts should get scaled down appropriately > during inlining. I am having problem to think of what default we should pick here. Thinking about the whole problem, I think the following sounds like possible sollution: 1) use the patch above to get frequencies computed where counts are 0. Make sure that branch probailities are guessed for functions with non-0 counts for each of branch that has count 0. 2) At a point we see inconsistency in the IPA profile (i.e. we get to fact that we have direct calls with non-0 counts to a comdat with 0 count), drop the profile_status for them to profile_guessed and let the backed optimize them as if there was no FDO. I do not think we can do better for those. If we do not see this problem, we can keep status as READ so the function will get size optimized. Here we will lose for indirect calls, but I do not see how to handle these short of making every 0 count COMDAT guessed. Perhaps you can try how these variants work for you performance/code size wise. 3) at a time we inline function with guessed profile into function with read profile, we will use your trick to feed the counts in based on frequencies. We will do that in tree-inline when inlining. I.e. we will never feed fake counts into non-inline functions 4) we will want to sanitize callgarph, too. This means looking for GUESSED profile functions and feeding their counts/edge counts based on sum of the incomming counts. We can do this as part of ipa-profile pass (I will move it to separate file as it is getting more complex) Here we will have the propagation issue you mention above. Either we can ignore it or we can iterate. 3) and 4) will absolutely need some capping to be sure we won't make the synthetized counts bigger than other counts in program. I think we have code for 1-3. If the plan looks sane to you, I think we can start getting it in? Of course we want to eliminate most of the problems by getting runtime to do the merging... (but the merging will not be posible always due to cfg differences comming from different optimization flags, anyway). Honza