Re: Improve profile update in merge_blocks

Richard Biener Tue, 07 Oct 2025 04:49:00 -0700

On Tue, Oct 7, 2025 at 12:39 PM Jan Hubicka <[email protected]> wrote:
>
> > > +/* If basic block is empty, we can remove conditionals that controls
> > > +   its execution.  However in some cases the empty BB can stay live
> > > +   (such as when it was a header of empty loop).  In this case we
> > > +   need to update its count.  Since regions of dead BBs are acyclic
> > > +   we simply propagate counts forward from live BBs to dead ones.  */
> >
> > I can't follow how this works.  A region of dead BBs might fully
> > contain a cycle btw.
>
> It is usual topological sorting algorithm for DAGs based on reference
> counting (which runs only within regions of dead BBs). cnt map holds the
> refernece counts (number of incomming edges where profile is not yet
> known) and is only populated by BBS which do not have yet final count.
> Queue is a worklist (should been called this way since it is stack) of
> those having 0 unknown incomming edges.
>
> My first version just used RPO order of whole CFG, but then I was
> thinking what to do with loops of dead BBs.  Since dead BBs has only one
> exit edge they can only form infinite loops.
>
> For infinite loops with non-zero entry count we have no way to represent
> a profile that is consistent, since profile_count can not represent
> infinity.  In practice we do not want to have very large counts on
> infinite loops.  They are either errorneous code paths and should have
> count of 0, while large count would make valid part of program cold. Or
> they can be waiting loops which are not infinite anyway, but we do not
> represent their exit (interrupt).  Static profile guesses infinte loop
> to have --param max-predicted-iterations.  With real profile they either
> should be known to be not reached (count 0) or have count representing
> their wait time (which is probably not useful and may confuse hot/cold
> heuristics).
>
> In both cases I think it makes sense to simply not touch exisitng
> profile of infinite loops in DCE.  If it was measured it is still valid;
> if it was guessed it is at least capped to not confuse rest of compiler.
>
> THe algorithm iterating in RPO order would increase counts of the
> infinite loops each time it is run. This is why I do reference counting
> since that will simply stop at the entry of the loop.
> It was not clear to me if I can detect those loops easily in the RPO
> walk since they may have multiple entry points, so I went for this
> option.
>
> Honza
> >
> > > +
> > > +static void
> > > +propagate_counts ()
> > > +{
> > > +  basic_block bb;
> > > +  auto_vec<basic_block, 16> queue;
> > > +  hash_map <basic_block, int> cnt;
> > > +
> > > +  FOR_EACH_BB_FN (bb, cfun)
> > > +    if (!bitmap_bit_p (bb_contains_live_stmts, bb->index))
> > > +      {
> > > +       int n = 0;
> > > +       for (edge e : bb->preds)
> > > +         if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
> > > +             && !bitmap_bit_p (bb_contains_live_stmts, e->src->index))
> > > +           n++;
> > > +       if (!n)
> > > +         queue.safe_push (bb);
> >
> > So everything we push has no live stmts itself and all of its incoming
> > edge srcs have live stmts (thus the src blocks are all not removed).
> > We don't now whether we actually elided BB though, bb->flags & BB_REACHABLE
> > says so.
>
> Unreachable BBs would stop propagation too. I put this after eliding
> unreachable BBs so I believe they should be removed.  I had asseert that
> tested that while bootstrap the algorithm always finishes with updating
> all counts (no infinite loops). In testsuite we have plenty of infinite
> loops of course.
>
> I will update the comment.  Also I think i can avoid allocation of the
> map if I process BBs with 0 unknown in-edges immediately since chained
> dead BBs are quite rare in practice.
>
> Does this make sense?


Yes.  I thought that a vec<int> for the count would be better
but then I realized you use the hash-map as presence indicator on the queue
as well, so we wouldn't save the vec<int> clearing.

Richard.

> Honza

Re: Improve profile update in merge_blocks

Reply via email to