On Tue, Oct 7, 2025 at 12:39 PM Jan Hubicka <[email protected]> wrote: > > > > +/* If basic block is empty, we can remove conditionals that controls > > > + its execution. However in some cases the empty BB can stay live > > > + (such as when it was a header of empty loop). In this case we > > > + need to update its count. Since regions of dead BBs are acyclic > > > + we simply propagate counts forward from live BBs to dead ones. */ > > > > I can't follow how this works. A region of dead BBs might fully > > contain a cycle btw. > > It is usual topological sorting algorithm for DAGs based on reference > counting (which runs only within regions of dead BBs). cnt map holds the > refernece counts (number of incomming edges where profile is not yet > known) and is only populated by BBS which do not have yet final count. > Queue is a worklist (should been called this way since it is stack) of > those having 0 unknown incomming edges. > > My first version just used RPO order of whole CFG, but then I was > thinking what to do with loops of dead BBs. Since dead BBs has only one > exit edge they can only form infinite loops. > > For infinite loops with non-zero entry count we have no way to represent > a profile that is consistent, since profile_count can not represent > infinity. In practice we do not want to have very large counts on > infinite loops. They are either errorneous code paths and should have > count of 0, while large count would make valid part of program cold. Or > they can be waiting loops which are not infinite anyway, but we do not > represent their exit (interrupt). Static profile guesses infinte loop > to have --param max-predicted-iterations. With real profile they either > should be known to be not reached (count 0) or have count representing > their wait time (which is probably not useful and may confuse hot/cold > heuristics). > > In both cases I think it makes sense to simply not touch exisitng > profile of infinite loops in DCE. If it was measured it is still valid; > if it was guessed it is at least capped to not confuse rest of compiler. > > THe algorithm iterating in RPO order would increase counts of the > infinite loops each time it is run. This is why I do reference counting > since that will simply stop at the entry of the loop. > It was not clear to me if I can detect those loops easily in the RPO > walk since they may have multiple entry points, so I went for this > option. > > Honza > > > > > + > > > +static void > > > +propagate_counts () > > > +{ > > > + basic_block bb; > > > + auto_vec<basic_block, 16> queue; > > > + hash_map <basic_block, int> cnt; > > > + > > > + FOR_EACH_BB_FN (bb, cfun) > > > + if (!bitmap_bit_p (bb_contains_live_stmts, bb->index)) > > > + { > > > + int n = 0; > > > + for (edge e : bb->preds) > > > + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) > > > + && !bitmap_bit_p (bb_contains_live_stmts, e->src->index)) > > > + n++; > > > + if (!n) > > > + queue.safe_push (bb); > > > > So everything we push has no live stmts itself and all of its incoming > > edge srcs have live stmts (thus the src blocks are all not removed). > > We don't now whether we actually elided BB though, bb->flags & BB_REACHABLE > > says so. > > Unreachable BBs would stop propagation too. I put this after eliding > unreachable BBs so I believe they should be removed. I had asseert that > tested that while bootstrap the algorithm always finishes with updating > all counts (no infinite loops). In testsuite we have plenty of infinite > loops of course. > > I will update the comment. Also I think i can avoid allocation of the > map if I process BBs with 0 unknown in-edges immediately since chained > dead BBs are quite rare in practice. > > Does this make sense?
Yes. I thought that a vec<int> for the count would be better but then I realized you use the hash-map as presence indicator on the queue as well, so we wouldn't save the vec<int> clearing. Richard. > Honza
