On Wed, Aug 28, 2013 at 9:58 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > with Martin we did bit of progress on analyzing the problems. We now have > COMDAT profile merging for FDO
Great! Is this the LTO merging you were talking about in an earlier message, or the gcov runtime fix (that would presumably not be lto-specific)? > and we also noticed that forks can make your > basic blocks appear never executed even though they are executed every run: > the fork is accounted as 3 independnet runs of the program. First run > is until fork, the second 2 runs are both variant. > > I have patch to track this. Moreover vforks seems to produce repeated > merging of results. Aha, does this explain the gimp issue as well? > > These two factors seems to help to riddle enought the firefox profiles > so it took us weeks to understand what happens. >> + if (changed) >> + { >> + /* Edge forwarding in particular can cause hot blocks >> previously >> + reached by both hot and cold blocks to become dominated >> only >> + by cold blocks. This will cause the verification >> below to fail, >> + and lead to now cold code in the hot section. This is not >> easy >> + to detect and fix during edge forwarding, and in some cases >> + is only visible after newly unreachable blocks are deleted, >> + which will be done in fixup_partitions. */ >> + fixup_partitions (); > > Is it really necessary to run this from internal loop of the cfgcleanup? The reason I added it here is that just below there is a call to verify_flow_info, and that will fail with the new verification. > It seems > you will play back and forth game where edge forwarding will remove your > fallthru > and you will be re-adding it? fixup_partitions will not add new fall through edges. (It may invoke force_nonfallthru to do the opposite.) So there shouldn't be any ping-ponging effect. > > I would wait for cfgcleanup to finish its job (I don't really think it needs > to be > iterative) and then do the fixup possibly cleaning up when the blocks was > repoisitoined > (I suppose it is only case where the code above introduces new cfgcleanup > oppurtunities). As noted above, I can't do this due to the call to verify_flow_info for each iteration. One option is to move both down outside the loop. > >> + /* Walk the preds/succs and check if there is at least one already >> + marked hot. Keep track of the most frequent pred/succ so that we >> + can mark it hot if we don't find one. */ >> + FOR_EACH_EDGE (e, ei, edges) >> + { >> + basic_block reach_bb = walk_up ? e->src : e->dest; >> + >> + if (e->flags & EDGE_DFS_BACK) >> + continue; >> + >> + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) >> + { >> + found = true; >> + break; >> + } >> + if (e->probability > highest_probability) >> + highest_probability = e->probability; > > When doing predecestor walk if you have two predecestors, one executing 100000 > times and getting to the block with probability 1%, you want to chose it over > block executing once and getting to you with probability 100%. > > You probably want to look for most likely predecestor here. You need to look > for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for > maximal probability only if all fails? Yes, thanks, let me do that. > >> + } >> + >> + /* If bb is reached by (or reaches, in the case of !WALK_UP) another >> hot >> + block (or unpartitioned, e.g. the entry block) then it is ok. If >> not, >> + then the most frequent pred (or succ) needs to be adjusted. In the >> + case where multiple preds/succs have the same probability (e.g. a >> + 50-50 branch), then both will be adjusted. */ >> + if (found) >> + continue; >> + >> + FOR_EACH_EDGE (e, ei, edges) >> + { >> + if (e->flags & EDGE_DFS_BACK) >> + continue; >> + if (e->probability < highest_probability) >> + continue; > > Again for predecestor walk you need to wind down the bit crazy logic > described above. >> Index: predict.c >> =================================================================== >> --- predict.c (revision 202021) >> +++ predict.c (working copy) >> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun >> return false; >> } >> >> + >> +/* Return true in case edge E is probably never executed. */ >> + >> +bool >> +probably_never_executed_edge_p (struct function *fun, edge e) >> +{ >> + gcc_checking_assert (fun); >> + if (profile_info && flag_branch_probabilities) >> + return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; >> + if ((!profile_info || !flag_branch_probabilities) >> + && (cgraph_get_node (fun->decl)->frequency >> + == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >> + return true; >> + return false; > Instead of duplicating the conditional, break out the tests into > probably_never_executed_count_p, like we have for maybe_hot_count_p. ok > > It would be nice to extend this to work w/o profile: > probably_never_executed_edge_p > can return true for EH edges, setjmp edges and we can then walk bodies > ignoring > EH/setjmp flagging blocks that are probably never executed enabling the > partitioning > to do a job w/o profile. agreed. although I would prefer to leave that for a follow-on patch so it can be tuned a bit. > > Otherwise the patch look OK to me. Thanks for working on this. Do we have > agreement on C++ way of mixing > declarations and code? According to http://gcc.gnu.org/wiki/CppConventions: "In new code variables which are used in a small scope should be defined at the point of first use, rather than at the top of the function. Variables which are used throughout the function may be defined at the top of the function, as in C." I think I am following that, but let me know if you see something that needs to be fixed. Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413