[PATCH] D131646: [clang][dataflow] Restructure loops to call widen on back edges

Gábor Horváth via Phabricator via cfe-commits Fri, 19 Aug 2022 14:20:38 -0700

xazax.hun added inline comments.


================
Comment at: 
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:228-234
+  // If we are at the start of a loop, we will have two precessors, but we 
don't
+  // want to join these two predecessors. Instead, we want to take the back 
edge
+  // block (i.e. the result of the previous iteration) and use that directly as
+  // the input block.
+  //
+  // For the first iteration of loop, the "zeroth" iteration state is set up by
+  // `prepareBackEdges`.
----------------
li.zhe.hua wrote:
> xazax.hun wrote:
> > xazax.hun wrote:
> > > li.zhe.hua wrote:
> > > > xazax.hun wrote:
> > > > > Could you elaborate on this? Let's consider this loop:
> > > > > 
> > > > > ```
> > > > >  Pred
> > > > >    |
> > > > >    v
> > > > >  LoopHeader <---BackEdge
> > > > > ```
> > > > > 
> > > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > > > 
> > > > > I would expect the analysis to always compute `join(PredState, 
> > > > > BackEdgeState)`, and I would expect the widening to happen between 
> > > > > the previous iteration of `BackEdgeState` and the current iteration 
> > > > > of `BackEdgeState`. So, I wonder if we already invoke the widening 
> > > > > operator along back edges, wouldn't the regular logic work just fine? 
> > > > > Do I miss something?
> > > > > 
> > > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > > 
> > > > We don't, and this is what the comment
> > > > 
> > > > ```
> > > >   // For the first iteration of loop, the "zeroth" iteration state is 
> > > > set up by
> > > >   // `prepareBackEdges`.
> > > > ```
> > > > failed to explain. After transferring `PredState`, we copy `PredState` 
> > > > into `BackEdgeState`, which is done in `prepareBackEdges`.
> > > > 
> > > > > I would expect the analysis to always compute `join(PredState, 
> > > > > BackEdgeState)`
> > > > 
> > > > I'm not sure I see that we should always join `PredState` and 
> > > > `BackEdgeState`. Execution doesn't go from `Pred` into the Nth 
> > > > iteration of the loop, it only goes from `Pred` into the first 
> > > > iteration of the loop, e.g. the predecessor for the 4th iteration of 
> > > > the loop is only the back-edge from the 3rd iteration of the loop, not 
> > > > `Pred`.
> > > > 
> > > > Let me know if this makes sense.
> > > > I'm not sure I see that we should always join `PredState` and 
> > > > `BackEdgeState`. Execution doesn't go from `Pred` into the Nth 
> > > > iteration of the loop, it only goes from `Pred` into the first 
> > > > iteration of the loop, e.g. the predecessor for the 4th iteration of 
> > > > the loop is only the back-edge from the 3rd iteration of the loop, not 
> > > > `Pred`.
> > > > 
> > > > Let me know if this makes sense.
> > > 
> > > The analysis state of the dataflow analysis supposed to overestimate all 
> > > of the paths. Consider the following loop and imagine we want to 
> > > calculate intervals for integer variables:
> > > 
> > > ```
> > > int i = 0;
> > > while (...) {
> > >   [[program point A]];
> > >   ++i;
> > > }
> > > ```
> > > 
> > > During the analysis of this loop, the value `i ==0` flows to `[[program 
> > > point A]]`. This is the motivation to join the state from the back edge 
> > > and from PredState. As far as I understand, you want to achieve this by 
> > > copying PredState to the BackEdgeState before the first iteration. But 
> > > this also means that we will use the widening operator to combine 
> > > PredState with the state after N iterations instead of the regular join 
> > > operation. I am not entirely sure whether these two approaches always 
> > > produce the same results. 
> > > 
> > > 
> > A contrived example (just came up with this in a couple minutes so feel 
> > free to double check if it is OK):
> > Consider:
> > ```
> > int i = 0;
> > while (...) {
> >   if (i == 0)
> >     i = -2;
> >   ++i;
> > }
> > ```
> > 
> > Some states:
> > ```
> > PredState = i : [0, 0]
> > BackEdgeState (after first iteration) = i : [-1, -1]
> > ```
> > 
> > And the results of join vs widen:
> > ```
> > PredState.join(BackEdgeState) = i : [-1, 0]
> > PredState.widen(BackEdge) = i : [-inf, 0]
> > ```
> > 
> > The extra widening with the PredState can result in extra imprecision. 
> So, my understanding is that widening introduces imprecision as a trade-off 
> for convergence. Yes, in the contrived example, joining converges after a few 
> iterations, but in other cases it never converges.
> 
> ---
> 
> Looking at the Rival and Yi book, there are no intervening joins with the 
> predecessor for the first loop as we analyze the loop. This is roughly where 
> I drew from in terms of this implementation.
> 
> ```
> analysis(iter{p}, a) = { R <- a;
>                          repeat
>                              T <- R;
>                              R <- widen(R, analysis(p, R));
>                          until inclusion(R, T)
>                          return T;
> ```
> 
> So, my understanding is that widening introduces imprecision as a trade-off 
> for convergence. Yes, in the contrived example, joining converges after a few 
> iterations, but in other cases it never converges.

In my example I do not advocate for NOT doing widening. I am advocating for not 
applying the widening operator with the loop pre-state and the loop iteration 
state as operands. My example demonstrates how that introduces additional 
imprecision. Not using the widening operator for those two states, only between 
loop iterations would still ensure convergence, but would give you more 
precision. My argument is, using widening in that step is not required for the 
convergence but we do pay for it in precision. Or do you have an example where 
you need widening for those two specific states? 

> Looking at the Rival and Yi book, there are no intervening joins with the 
> predecessor for the first loop as we analyze the loop. This is roughly where 
> I drew from in terms of this implementation.

Later in the book, they talk about loop unrolling where we don't use the 
widening operator in the first couple of iterations. The same imprecision I was 
talking about here could also be solved by unrolling the first iteration. I 
believe the book is might not talk about this technique because the unrolling 
subsumes it.

The main reason I am advocating for this, because I believe if we are willing 
to accept the join there, we would no longer need to prepare the back edges and 
it would simplify the implementation while improving the precision of the 
analysis. 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131646/new/

https://reviews.llvm.org/D131646

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131646: [clang][dataflow] Restructure loops to call widen on back edges

Reply via email to