Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

Richard Biener via Gcc-patches Mon, 30 Aug 2021 02:19:25 -0700

On Mon, 30 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/27 15:45, Richard Biener wrote:
> > On Thu, 26 Aug 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/8/24 16:20, Richard Biener wrote:
> >>> On Tue, 24 Aug 2021, Xionghu Luo wrote:
> >>>
> >>>>
> >>>>
> >>>> On 2021/8/19 20:11, Richard Biener wrote:
> >>>>>> -  class loop *inn_loop = loop;
> >>>>>>     
> >>>>>>       if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> >>>>>>         {
> >>>>>> @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop,
> >>>>>> @@ sbitmap contains_call)
> >>>>>>           to disprove this if possible).  */
> >>>>>>        if (bb->flags & BB_IRREDUCIBLE_LOOP)
> >>>>>>            break;
> >>>>>> -
> >>>>>> -        if (!flow_bb_inside_loop_p (inn_loop, bb))
> >>>>>> -          break;
> >>>>>> -
> >>>>>> -        if (bb->loop_father->header == bb)
> >>>>>> -          {
> >>>>>> -            if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>>>> -              break;
> >>>>>> -
> >>>>>> -            /* In a loop that is always entered we may proceed
> >>>>>> anyway.
> >>>>>> -               But record that we entered it and stop once we leave
> >>>>>> it.  */
> >>>>>> -            inn_loop = bb->loop_father;
> >>>>>> -          }
> >>>>>>      }
> >>>>>>     
> >>>>>>           while (1)
> >>>>> I'm not sure this will work correct (I'm not sure how the existing
> >>>>> code makes it so either...).  That said, I can't poke any hole
> >>>>> into the change.  What I see is that definitely
> >>>>>
> >>>>>              if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>>>                last = bb;
> >>>>>
> >>>>>              if (bitmap_bit_p (contains_call, bb->index))
> >>>>>                break;
> >>>>>
> >>>>> doesn't work reliably since the DOM ordering will process blocks
> >>>>> A B and C in random order for
> >>>>>
> >>>>>      for (;;)
> >>>>>       {
> >>>>>          if (cond)
> >>>>>            {
> >>>>>              A: foo ();
> >>>>>            }
> >>>>>          else B:;
> >>>>>          C:;
> >>>>>       }
> >>>>>
> >>>>> and thus we can end up setting 'last' to C_before_  processing
> >>>>> 'A' and thus arriving at the call foo () ...
> >>>>>
> >>>>> get_loop_body_in_dom_order does some "special sauce" but not
> >>>>> to address the above problem - but it might be that a subtle
> >>>>> issue like the above is the reason for the inner loop handling.
> >>>>> The inner loop block order does_not_  adhere to this "special sauce",
> >>>>> that is - the "Additionally, if a basic block s dominates
> >>>>> the latch, then only blocks dominated by s are be after it."
> >>>>> guarantee holds for the outer loop latch, not for the inner.
> >>>>>
> >>>>> Digging into the history of fill_always_executed_in_1 doesn't
> >>>>> reveal anything - the inner loop handling has been present
> >>>>> since introduction by Zdenek - but usually Zdenek has a reason
> >>>>> for doing things as he does;)
> >>>>
> >>>> Yes, this is really complicated usage, thanks for point it out. :)
> >>>> I constructed two cases to verify this with inner loop includes "If A;
> >>>> else B; C".
> >>>> Finding that fill_sons_in_loop in get_loop_body_in_dom_order will also
> >>>> checks
> >>>> whether the bb domintes outer loop’s latch, if C dominate outer loop’s
> >>>> latch,
> >>>> C is postponed, the access order is ABC, 'last' won’t be set to C if A or
> >>>> B contains call;
> >>>
> >>> But it depends on the order of visiting ABC and that's hard to put into
> >>> a testcase since it depends on the order of edges and the processing
> >>> of the dominance computation.  ABC are simply unordered with respect
> >>> to a dominator walk.
> >>>
> >>>> Otherwise if C doesn’t dominate outer loop’s latch in fill_sons_in_loop,
> >>>> the access order is CAB, but 'last' also won’t be updated to C in
> >>>> fill_always_executed_in_1
> >>>> since there is also dominate check, then if A or B contains call, it
> >>>> could break
> >>>> successfully.
> >>>>
> >>>> C won't be set to ALWAYS EXECUTED for both circumstance.
> >>>>
> >>>>>
> >>>>> Note it might be simply a measure against quadratic complexity,
> >>>>> esp. since with your patch we also dive into not always executed
> >>>>> subloops as you remove the
> >>>>>
> >>>>>                  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>>>                    break;
> >>>>>
> >>>>> check.  I suggest to evaluate behavior of the patch on a testcase
> >>>>> like
> >>>>>
> >>>>> void foo (int n, int **k)
> >>>>> {
> >>>>>      for (int i = 0; i < n; ++i)
> >>>>>        if (k[0][i])
> >>>>>          for (int j = 0; j < n; ++j)
> >>>>>            if (k[1][j])
> >>>>>              for (int l = 0; l < n; ++l)
> >>>>>                if (k[2][l])
> >>>>>                  ...
> >>>>> }
> >>>>
> >>>> Theoretically the complexity is changing from L1(bbs) to
> >>>> L1(bbs)+L2(bbs)+L3(bbs)+…+Ln(bbs),
> >>>> so fill_always_executed_in_1's execution time is supposed to be increase
> >>>> from
> >>>> O(n) to O(n2)?  The time should depend on loop depth and bb counts.   I
> >>>> also drafted a
> >>>> test case has 73-depth loop function with 25 no-ipa function copies each
> >>>> compiled
> >>>> in lim2 and lim4 dependently.  Total execution time of
> >>>> fill_always_executed_in_1 is
> >>>> increased from 32ms to 58ms, almost doubled but not quadratic?
> >>>
> >>> It's more like n + (n-1) + (n-2) + ... + 1 which is n^2/2 but that's still
> >>> O(n^2).
> >>>
> >>>> It seems reasonable to see compiling time getting longer since most bbs
> >>>> are checked
> >>>> more but a MUST to ensure early break correctly in every loop level...
> >>>> Though loop nodes could be huge, loop depth will never be so large in
> >>>> actual code?
> >>>
> >>> The "in practice" argument is almost always defeated by automatic
> >>> program generators ;)
> >>>
> >>>>>    I suspect you'll see quadratic behavior with your patch.  You
> >>>>> should be at least able to preserve a check like
> >>>>>
> >>>>>              /* Do not process not always executed subloops to avoid
> >>>>>                 quadratic behavior.  */
> >>>>>              if (bb->loop_father->header == bb
> >>>>>                  && !dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>>>                break;
> >>>>>
> >>>>> which is of course not optimistic for cases like
> >>>>>
> >>>>>      for (..)
> >>>>>       {
> >>>>>         if (cond)
> >>>>>           for (..)
> >>>>>         x = 1; // this is always executed if the inner loop is finite
> >>>>>       }
> >>>>>
> >>>>> but we need to have an eye on the complexity of this function.  I would
> >>>>> have suggested to do greedy visiting of the loop header successors,
> >>>>> processing further blocks if all entries (ignoring backedges) are
> >>>>> processed, setting SET_ALWAYS_EXECUTED_IN.  When the worklist
> >>>>> is empty proceed to inner loops as the current code does.  For
> >>>>> bookkeeping simply keep a to-visit-incoming-edges counter per BB.
> >>>>> Pseudo-code:
> >>>>>
> >>>>>      bitmap_set_bit (worklist, loop-header-bb);
> >>>>>      while (!bitmap_empty_p (worklist))
> >>>>>        {
> >>>>>          bb = pop (worklist);
> >>>>
> >>>> Need check whether bb dominates latch before SET_ALWAYS_EXECUTED_IN?
> >>>
> >>> Ah, sure.
> >>>
> >>>>>          SET_ALWAYS_EXECUTED_IN (bb, loop);
> >>>>>          if (bitmap_bit_p (contains_call, bb->index))
> >>>>>            continue;
> >>>>>          FOR_EACH_EDGE (e, ei, bb->succs)
> >>>>>            {
> >>>>>              if (!flow_bb_inside_loop_p (loop, e->dest))
> >>>>>                continue;
> >>>>>              if (incoming_count[e->dest->index]-- == 0)
> >>>>>                push (worklist, e->dest);
> >>>>>            }
> >>>>>        }
> >>>>
> >>>> Sorry I don't fully understand your algorithm.  worklist should be
> >>>> auto_vec<basic_block> don't support bitmap operations?  Is incoming_count
> >>>> the bb's preds count, why need retain it since it it decreased to 0?
> >>>
> >>> the worklist is a auto_bitmap, pop () would be
> >>> bitmap_first_set_bit/clear_bit.  One could use a vector with the
> >>> caveat of eventually adding duplicate members to the worklist.
> >>> But as said, it's pseudo-code ;)
> >>
> >> Thanks a lot!
> >>
> >>>
> >>>>>      
> >>>>>      iterate over inner loops (incoming_count can be retained,
> >>>>>      we just force the inner loop header onto the worklist).
> >>>>
> >>>> Is this same to ?
> >>>>
> >>>>     for (loop = loop->inner; loop; loop = loop->next)
> >>>>       fill_always_executed_in_1 (loop, contains_call)
> >>>
> >>> Yes, but I'd simply use for (loop : loops_list (cfun, 0)) which
> >>> should iterate over loops in PRE order.
> >>
> >> Tried implementation with your pseudo-code, it works like the
> >> previous v2 solution, bb after inner loops could be marked
> >> as ALWAYS EXECUTED.  You are really so strong! ;)
> >> Regression tested pass on P8LE.
> >>
> >> Assume,
> >>
> >>    A: GCC Base
> >>    B: inner loop check removal
> >>    C: incoming count
> >>
> >> Execution time of fill_always_executed_in_1 for the 73-depth loop
> >> (with five function copies) is
> >> (A vs. B vs. C): 32ms vs 58ms vs 38ms.  Much better than O(n^2)?
> 
> Sorry didn't get C's execution time accurate enough due to forgot to remove
> dump file code in it.
> C's execution time is 25 ms after code refine.  It's even better than A.
> Attached the test case ssa-lim-22.c used for time measurement.
> 
> >>
> >>
> >> Bumped the patch to v3 with TODOs:
> >>   1. The time measurement code will be removed if the v3 is OK;
> >>   2. loops_list is not used as my code is not rebased to upstream yet.
> >>   3. Function comments is not updated.
> >>
> >>
> >>
> >> [PATCH v3] Fix incomplete computation in fill_always_executed_in_1
> >>
> >>
> >> ALWAYS_EXECUTED_IN is not computed completely for nested loops.
> >> Current design will exit if an inner loop doesn't dominate outer
> >> loop's latch or exit after exiting from inner loop, which
> >> caused early return from outer loop, then ALWAYS EXECUTED blocks after
> >> inner loops are skipped.  Another problem is it has to call
> >> get_loop_body_in_dom_order in each recursive call which is also
> >> inefficient.
> >>
> >> This patch does greedy visiting of the loop header successors,
> >> processing further blocks if all entries (ignoring backedges) are
> >> processed, setting SET_ALWAYS_EXECUTED_IN of dominates loop's latch.
> >> When the worklist is empty proceed to inner loops as the current
> >> code does.  For bookkeeping simply keep a to-visit-incoming-edges
> >> counter per BB, and pass it through iteration over inner loops.
> >>
> >> Details discusions:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577743.html
> >>
> >> gcc/ChangeLog:
> >>
> >>  * tree-ssa-loop-im.c (fill_always_executed_in_1): Remove
> >>  inn_loop check.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >>  * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> >> ---
> >>   gcc/tree-ssa-loop-im.c                     | 95 +++++++++++++---------
> >>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++
> >>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 30 +++++++
> >>   3 files changed, 111 insertions(+), 38 deletions(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> >>
> >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> >> index b24bc64f2a7..72be6b6bfac 100644
> >> --- a/gcc/tree-ssa-loop-im.c
> >> +++ b/gcc/tree-ssa-loop-im.c
> >> @@ -3192,39 +3192,52 @@ do_store_motion (void)
> >>      blocks that contain a nonpure call.  */
> >>   
> >>   static void
> >> -fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
> >> +fill_always_executed_in_1 (class loop *loop, sbitmap contains_call, int
> >> *bbi)
> >>   {
> >> -  basic_block bb = NULL, *bbs, last = NULL;
> >> -  unsigned i;
> >> +  basic_block bb = NULL;
> >>     edge e;
> >> -  class loop *inn_loop = loop;
> >>   
> >>     if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> >>       {
> >> -      bbs = get_loop_body_in_dom_order (loop);
> >> +      auto_bitmap work_set;
> >> +      bitmap_clear (work_set);
> > 
> > bitmap_clear isn't necessary
> 
> Removed.
> 
> > 
> >> +      bitmap_set_bit (work_set, loop->header->index);
> >> +      unsigned bb_index;
> >>   -      for (i = 0; i < loop->num_nodes; i++)
> >> -  {
> >> -    edge_iterator ei;
> >> -    bb = bbs[i];
> >> +      unsigned array_size = last_basic_block_for_fn (cfun) + 1;
> >> +      int *bbd = XNEWVEC (int, array_size);
> >> +      bbd = XDUPVEC (int, bbi, array_size);
> > 
> > I don't think you need to copy 'bbi' but you can re-use the
> > state from the outer loop processing.  Did you run into any
> > issues with that?
> 
> Yes.  For example, adding a small if-else block to ssa-lim-19.c,
> Then block "x->j += tem * i;" of bb 6 is always executed for loop 2, when call
> fill_always_executed_in_1 for loop 1, bbi[6] is decreased from 2 to 1 to 0,
> then if fill_always_executed_in_1 is called again for loop 2, it's value is
> not
> reset so bbi[6] won't be set ALWAYS EXECUTE, this is wrong.
> 
> 
> struct X { int i; int j; int k;};
> 
> void foo(struct X *x, int n, int l, int m)
> {
>  for (int j = 0; j < l; j++)  // loop 1
>    {
>      for (int i = 0; i < n; ++i)  // loop 2
>        {
>          if (m)
>            x->j++;
>          else
>            x->j = m+n+l;
> 
>          int *p = &x->j;   // bb 6
>          int tem = *p;
>          x->j += tem * i;
>        }
>      int *r = &x->k;
>      int tem2 = *r;
>      x->k += tem2 * j;
>    }
> }


Hmm, but if the outer loop processing reaches bb 6 then
it should have set it ALWAYS_EXECUTED in loop 1 already?

> 
> 
> >   
> >> +      while (!bitmap_empty_p (work_set))
> >> +  {
> >> +    bb_index = bitmap_first_set_bit (work_set);
> >> +    bitmap_clear_bit (work_set, bb_index);
> >> +    bb = BASIC_BLOCK_FOR_FN (cfun, bb_index);
> >>      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >> -      last = bb;
> >> -
> >> +      SET_ALWAYS_EXECUTED_IN (bb, loop);
> >>      if (bitmap_bit_p (contains_call, bb->index))
> >>        break;
> > 
> > I think you want to continue; here (process remaining worklist
> > but not continue greedy walking this block)
> 
> Same as above, if use 'continue' instead of 'break', the algorithm
> seems also not work again.  If inner loop contains a jump to outmost
> loop, the blocks after the jump block will be set to ALWAYS EXECUTE
> incorrectly.
> 
> > 
> >> -
> >> +    edge_iterator ei;
> >>      FOR_EACH_EDGE (e, ei, bb->succs)
> >>        {
> >> -        /* If there is an exit from this BB.  */
> >>          if (!flow_bb_inside_loop_p (loop, e->dest))
> >>     break;
> > 
> > in particular this should keep the outer 'bbi' valid to re-use.
> > 
> > But again, you want 'continue;' the greedy walk to other edges.
> > If that's not valid (I'd need to think about this) then with
> > your patch whether we process an edge depends on the order
> > of the edge visit so you'd have to walk successors twice,
> > once to determine whether we can greedily walk any of it
> > and once to actually do the greedy walk.
> > 
> > So thinking about it an exit edge is like a not returning call
> > and thus we indeed should not process any outgoing edges of
> > this block.
> > 
> >> +
> >>          /* Or we enter a possibly non-finite loop.  */
> >>          if (flow_loop_nested_p (bb->loop_father,
> >>             e->dest->loop_father)
> >>       && ! finite_loop_p (e->dest->loop_father))
> >>     break;
> > 
> > I think this is no longer necessary?  In any case it would
> > again be 'continue;', see also above.
> > 
> > I'm missing skipping of backedges in the walk.
> > 
> >> +
> >> +        if (bbd[e->dest->index] == 1)
> >> +        {
> >> +          bitmap_set_bit (work_set, e->dest->index);
> >> +          bbd[e->dest->index]--;
> >> +        }
> >> +        else if (bbd[e->dest->index] > 1)
> >> +          bbd[e->dest->index]--;
> > 
> > maybe just
> > 
> >              bbd[e->dest->index]--;
> >              if (bbd[e->dest->index] == 0)
> >                bitmap_set_bit (work_set, e->dest->index);
> 
> They are not equivalent.  Former won't decrease if bbd[e->dest->index]
> is 0, but the latter does.

we should be never visiting in this case, so sth indeed doesn't
work as I expected.

> > 
> >>        }
> >> +
> >>      if (e)
> >>        break;
> > 
> > continue; processing the worklist
> > 
> >>   
> >> @@ -3232,34 +3245,12 @@ fill_always_executed_in_1 (class loop *loop,
> >> @@ sbitmap contains_call)
> >>         to disprove this if possible).  */
> >>      if (bb->flags & BB_IRREDUCIBLE_LOOP)
> >>        break;
> > 
> > continue; processing the worklist.  I think this has to
> > come early, before processing successors, next to
> > the contains_call processing.  We could think of ignoring
> > entering of irreducible regions by tracking exits from them,
> > but I'm not sure we're identifying the regions in a good
> > enough way to allow this.
> > 
> > Likewise possibly infinite inner loops can be processed
> > when we track exits from those which would be sth like
> > 
> >      /* When this is an exit from an inner loop that we cannot
> >         prove as finite do not follow this edge.  */
> >      if (bb->loop_father != loop
> >          && loop_exit_edge_p (bb->loop_father, e)
> >          && ! finite_loop_p (bb->loop_father))
> >        continue;
> > 
> >> -
> >> -    if (!flow_bb_inside_loop_p (inn_loop, bb))
> >> -      break;
> >> -
> >> -    if (bb->loop_father->header == bb)
> >> -      {
> >> -        if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >> -          break;
> >> -
> >> -        /* In a loop that is always entered we may proceed anyway.
> >> -           But record that we entered it and stop once we leave it.  */
> >> -        inn_loop = bb->loop_father;
> >> -      }
> >> -  }
> >> -
> >> -      while (1)
> >> -  {
> >> -    SET_ALWAYS_EXECUTED_IN (last, loop);
> >> -    if (last == loop->header)
> >> -      break;
> >> -    last = get_immediate_dominator (CDI_DOMINATORS, last);
> >>    }
> >> -
> >> -      free (bbs);
> >> +      free (bbd);
> >>       }
> >>   
> >>     for (loop = loop->inner; loop; loop = loop->next)
> >> -    fill_always_executed_in_1 (loop, contains_call);
> >> +    fill_always_executed_in_1 (loop, contains_call, bbi);
> >>   }
> >>   
> >>   /* Fills ALWAYS_EXECUTED_IN information for basic blocks, i.e.
> >> @@ -3287,8 +3278,36 @@ fill_always_executed_in (void)
> >>    bitmap_set_bit (contains_call, bb->index);
> >>       }
> >>   +  mark_dfs_back_edges ();
> > 
> > Please put this to loop_invariant_motion_in_fun right before
> > the call to fill_always_executed_in.
> 
> Done.
> 
> > 
> >> +  unsigned array_size = last_basic_block_for_fn (cfun) + 1;
> >> +  int *bb_incoming_edges = XNEWVEC (int, array_size);
> > 
> > There's XCNEWVEC that also zeros the array.
> 
> OK, thanks.
> 
> > 
> >> +  memset (bb_incoming_edges, 0, array_size * sizeof (int));
> >> +  edge_iterator ei;
> >> +  edge e;
> >> +
> >> +  FOR_EACH_BB_FN (bb, cfun)
> >> +    {
> >> +      int len = bb->preds->length ();
> >> +      FOR_EACH_EDGE (e, ei, bb->preds)
> >> +  if (e->flags & EDGE_DFS_BACK)
> >> +    len--;
> >> +      bb_incoming_edges[bb->index] = len;
> >> +    }
> > 
> > it would be nice to combine this with the preceeding loop over
> > all blocks.
> 
> Done.
> 
> > 
> >> +  static unsigned long diff = 0;
> >> +  struct timeval tv;
> >> +  gettimeofday (&tv, NULL);
> >> +  unsigned long begin = (unsigned long) tv.tv_sec * 1000000 + tv.tv_usec;
> >> +
> >> +  //for (loop : loops_list (cfun, 0))
> >>     for (loop = current_loops->tree_root->inner; loop; loop = loop->next)
> >> -    fill_always_executed_in_1 (loop, contains_call);
> >> +    fill_always_executed_in_1 (loop, contains_call, bb_incoming_edges);
> >> +
> >> +  gettimeofday (&tv, NULL);
> >> +  unsigned long end = (unsigned long) tv.tv_sec * 1000000 + tv.tv_usec;
> >> +  diff += end - begin;
> >> +  //printf ("%s,diff:%ld\n", current_function_name (), diff);
> >> +  free (bb_incoming_edges);
> > 
> > Eh, looks very much like work in progress.  From the O() complexity
> > point duplicating bb_incoming_edges for each loop makes it quadratic
> > again but as said I think we don't need this duplication.
> 
> Sorry I didn't quite follow your above comments about reusing
> bb_incoming_edges,
> we are searching the bbs that dominate loop->latch loop by loop independently,
> if outer loop changes the inner loops bb_incoming_edges, it will make the
> inner
> loop's ALWAYS EXECUTE computation incorrect?

We're only supposed to greedily walk into always excuted blocks.  I see
this probably breaks down for conditionally executed inner loops like

  for ()
   {
    if ()
      for ();
    A;
   }

where we want to process the inner loop separately but we still want
to make sure A is marked as always executed in the outer loop.

> Seems duplicating bb_incoming_edges not increases the complexity but the
> recursive call does?

Since the bb_incoming_edges array has a size of N and we copy it
number_of_loop times (which is O(N) bound) the copying effectively
has O(N^2) cost.  So yes, it's the iteration over all loops,
whether by recursion or iteration.

I wonder if we can more cleverly handle this by picking up
conditionally executed inner loops on the fly for example
by recording a 'outermost unconditionally executed loop'
for each loop header we visit and using that like
SET_ALWAYS_EXECUTED_IN (bb, outermost_always_exec_in (bb->loop_father))

So make this a single greedy CFG walk starting at the outermost
loops.  loop exists will need to be processed specially then.

Richard.

> > 
> > I'd like to see the recursion removed, did the for (loop : ...)
> > not work out?
> 
> The recursion in fill_always_executed_in_1 could be removed since loops_list
> could iterate all inner loops but
> "for (loop = current_loops->tree_root->inner; loop; loop = loop->next)"
> only iterate sibling level loops?
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

Reply via email to