[PATCH] D131646: [clang][dataflow] Restructure loops to call widen on back edges

Eric Li via Phabricator via cfe-commits Fri, 19 Aug 2022 13:09:08 -0700

li.zhe.hua added inline comments.


================
Comment at: 
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:168-169
+// back edge block. (That is, all paths from the entry block to the back edge
+// block must go through `Block`.) It also means that there are only two
+// predecessors; the other is along the path from the entry block to `Block`.
+static const CFGBlock *findBackEdge(const CFGBlock *Block) {
----------------
sgatev wrote:
> li.zhe.hua wrote:
> > xazax.hun wrote:
> > > Is this also true when we have multiple `continue` statements in the loop?
> > Yes. The end of the loop, and each of the `continue` statements, target the 
> > back edge block. They all get funneled through that back edge to `Block`, 
> > such that `Block` only has two predecessors. However, I haven't verified 
> > this in the CFG code, only by not finding a counterexample.
> Does that hold for back edges stemming from `goto` statements?
Ah, `goto` does throw a wrench in all of this. Specifically, it is problematic 
only for `do ... while` loops, where the labeled statement is the first 
statement of the `do` block.

I took out the sentence, took out the `assert`, and added a `FIXME` for 
`prepareBackEdges`.


================
Comment at: 
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:228-234
+  // If we are at the start of a loop, we will have two precessors, but we 
don't
+  // want to join these two predecessors. Instead, we want to take the back 
edge
+  // block (i.e. the result of the previous iteration) and use that directly as
+  // the input block.
+  //
+  // For the first iteration of loop, the "zeroth" iteration state is set up by
+  // `prepareBackEdges`.
----------------
xazax.hun wrote:
> xazax.hun wrote:
> > li.zhe.hua wrote:
> > > xazax.hun wrote:
> > > > Could you elaborate on this? Let's consider this loop:
> > > > 
> > > > ```
> > > >  Pred
> > > >    |
> > > >    v
> > > >  LoopHeader <---BackEdge
> > > > ```
> > > > 
> > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > > 
> > > > I would expect the analysis to always compute `join(PredState, 
> > > > BackEdgeState)`, and I would expect the widening to happen between the 
> > > > previous iteration of `BackEdgeState` and the current iteration of 
> > > > `BackEdgeState`. So, I wonder if we already invoke the widening 
> > > > operator along back edges, wouldn't the regular logic work just fine? 
> > > > Do I miss something?
> > > > 
> > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > 
> > > We don't, and this is what the comment
> > > 
> > > ```
> > >   // For the first iteration of loop, the "zeroth" iteration state is set 
> > > up by
> > >   // `prepareBackEdges`.
> > > ```
> > > failed to explain. After transferring `PredState`, we copy `PredState` 
> > > into `BackEdgeState`, which is done in `prepareBackEdges`.
> > > 
> > > > I would expect the analysis to always compute `join(PredState, 
> > > > BackEdgeState)`
> > > 
> > > I'm not sure I see that we should always join `PredState` and 
> > > `BackEdgeState`. Execution doesn't go from `Pred` into the Nth iteration 
> > > of the loop, it only goes from `Pred` into the first iteration of the 
> > > loop, e.g. the predecessor for the 4th iteration of the loop is only the 
> > > back-edge from the 3rd iteration of the loop, not `Pred`.
> > > 
> > > Let me know if this makes sense.
> > > I'm not sure I see that we should always join `PredState` and 
> > > `BackEdgeState`. Execution doesn't go from `Pred` into the Nth iteration 
> > > of the loop, it only goes from `Pred` into the first iteration of the 
> > > loop, e.g. the predecessor for the 4th iteration of the loop is only the 
> > > back-edge from the 3rd iteration of the loop, not `Pred`.
> > > 
> > > Let me know if this makes sense.
> > 
> > The analysis state of the dataflow analysis supposed to overestimate all of 
> > the paths. Consider the following loop and imagine we want to calculate 
> > intervals for integer variables:
> > 
> > ```
> > int i = 0;
> > while (...) {
> >   [[program point A]];
> >   ++i;
> > }
> > ```
> > 
> > During the analysis of this loop, the value `i ==0` flows to `[[program 
> > point A]]`. This is the motivation to join the state from the back edge and 
> > from PredState. As far as I understand, you want to achieve this by copying 
> > PredState to the BackEdgeState before the first iteration. But this also 
> > means that we will use the widening operator to combine PredState with the 
> > state after N iterations instead of the regular join operation. I am not 
> > entirely sure whether these two approaches always produce the same results. 
> > 
> > 
> A contrived example (just came up with this in a couple minutes so feel free 
> to double check if it is OK):
> Consider:
> ```
> int i = 0;
> while (...) {
>   if (i == 0)
>     i = -2;
>   ++i;
> }
> ```
> 
> Some states:
> ```
> PredState = i : [0, 0]
> BackEdgeState (after first iteration) = i : [-1, -1]
> ```
> 
> And the results of join vs widen:
> ```
> PredState.join(BackEdgeState) = i : [-1, 0]
> PredState.widen(BackEdge) = i : [-inf, 0]
> ```
> 
> The extra widening with the PredState can result in extra imprecision. 
So, my understanding is that widening introduces imprecision as a trade-off for 
convergence. Yes, in the contrived example, joining converges after a few 
iterations, but in other cases it never converges.

---

Looking at the Rival and Yi book, there are no intervening joins with the 
predecessor for the first loop as we analyze the loop. This is roughly where I 
drew from in terms of this implementation.

```
analysis(iter{p}, a) = { R <- a;
                         repeat
                             T <- R;
                             R <- widen(R, analysis(p, R));
                         until inclusion(R, T)
                         return T;
```



================
Comment at: 
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp:270
+
+TEST(DataflowAnalysisTest, ConvergesOnWidenAnalysis) {
+  std::string Code = R"(
----------------
sgatev wrote:
> There's already a set of "widening" tests - 
> http://google3/third_party/llvm/llvm-project/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp;l=527-712;rcl=462638952
> 
> What do you think about refactoring those so that we have tests that exercise 
> the framework with both `join` and `widen`?
This commit implements widening only for the lattice. Those tests use the 
`NoopLattice`, so their behavior is not expected to change with this commit. 
Splitting `merge` into a `join` and `widen` is out-of-scope for this commit, 
and is what I am hoping to work on next.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131646/new/

https://reviews.llvm.org/D131646

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131646: [clang][dataflow] Restructure loops to call widen on back edges

Reply via email to