[apache/incubator-tvm] [RFC] Data-flow Analysis Functionality on TVM IR (#4468)

EDJ Wed, 04 Dec 2019 16:38:08 -0800

# Problem

When developing program passes on TVM IR (the one once was Halide IR), it is 
normal to ask for all sorts of information requiring program analysis, for 
example, live variable analysis for dead code elimination. This requirement 
becomes urgent when TVM has to directly issue intrinsic and the subsequent 
processing stage (for example LLVM) cannot analyze these outputs.


# Consideration

*  Program Analysis Functionality must be consistent with the immutable style 
of TVM IR/AST
*  A Simple Framework is required to eliminate as much boilerplate as possible 
when the programmer tries to come up with other types of data-flow analysis
*  The interface must be easy to adapt to

We first start with a concrete example -- Live Variable Analysis.

###  Live Variable Analysis

I will take Live Variable Analysis as an example. I will define a functor, 
**inputting** *an AST* and *the set of live variable* after the AST is 
executed, **outputting** *the set of live variable* before the AST is executed.

```cpp
class LiveAnalysisFunctor : public ExprFunctor<set(const Expr&, const set&)>,
                             public StmtFunctor<set(const Stmt&, const set&)>
```

This functor can be easily defined inductively on the *AST*, demonstrated in 
pseudo-code
```haskell
-- the function below taking AST and post-live-variables
LiveAnalysisFunctor (Block{stmtX,stmtY}) postAlive = 
     let intermediateAlive = LiveAnalysisFunctor stmtY postAlive
     in LiveAnalysisFunctor stmtX intermediateAlive
```
In other words, we propagate up the live variable from the end to the 
beginning, **according to the execution order of TVM IR**

When encountering *if*-statement, we do similarly
```haskell
LiveAnalysisFunctor (IfStmt{cond,branchThen, branchElse}) postAlive = 
     let beforeThen = LiveAnalysisFunctor branchThen postAlive
          beforeElse   = LiveAnalysisFunctor branchElse postAlive
          afterConditional = Union(beforeThen, beforeElse)
     in LiveAnalysisFunctor cond afterConditional
```
The computation of *for*-statement is the only that requiring fixpoint 
computation. 

Also, we know at the endpoint, the live variables will be empty. 

Several (only relevant to live variable analysis) trivial fact:
* two variables are the same when they are declared at the same place in the 
program (since we have scopes in the AST, we also have variables with the same 
name)
* We have to distinguish, for each variable/FunctionRef, whether it is def or 
use

#### Memorization

You can see that in the current situation, to have the live variable 
information at arbitrary point, we need to calculate from very end to the point 
(and if that point is inside a for-loop, we need to make sure the fixpoint is 
reached). Thus it is suggested we only compute one time and saving the result. 
Fortunately, the above functor is pure, thus we can try to cache the above 
functor into a standard dictionary/map data structure, and thus *an AST* and 
*the set of live variable* as **key**, and *the set of live variable* before 
the AST as **value**. 


The memorization strategy should be able to apply to any *pure* IRFunctor:
```cpp
template <functor>
class MemorizedVersion<functor> : functor
```

During the computation of the liveness of the very beginning of the program, 
the whole program will be visited by the Functor and thus will be cached into 
the dictionary.

However, we may only need the information when fixpoint is reached, which means 
we can keep forgetting the old value every time a new pair of *AST* and 
*PostAlive* is computed while the *AST* has already once inserted into the 
dictionary with other *PostAlive'*. 

Ultimately, we want to transform the above dictionary into another dictionary 
taking *AST* as **key**, and the *live variable before executing AST* and *live 
variable after executing AST* as **value**. Of course, this dictionary has a 
clear specification that "only when the end of the whole program has empty set 
as live variables, each point in the AST has the live variable set as indicated 
in the dictionary".

Also note that, it is possible to query everywhere in the program about the 
Liveness Infomation, as long as the Functor is properly defined everywhere.

# Proposal

We generalize the above functor into forward/backward data-flow analysis 
'framework' to remove boilerplate code as much as possible

```cpp
// forward, functional, DFA
template <class lattice,        // the type lattice, also the 
prestate/poststate type
      class ltProof,      // the proof that lattice is a lattice, a subtype of 
IsLattice<lattice>
      const ltProof& ltOp> // lattice Operations, bind the operation at compile 
time
class ffdfa : public ExprFunctor<lattice(const Expr&, const lattice&)>,
              public StmtFunctor<lattice(const Stmt&, const lattice&)> {
  static_assert(std::is_base_of<IsLattice<lattice>, ltProof>::value, "ltProof 
must be a class derived from IsLattice<>");
...
}

template<class T>
class IsLattice { // classical lattice definition, T is lattice under the 
following operation
  public:
    // a least upperbound of two
    T join(const T& a, const T& b) = 0;
    // a greatest lower bound of two
    T meet(const T& a, const T& b) = 0;
    // highest
    const T& top() = 0;
    // whole set
    const T& bottom() = 0;
    bool equal(const T& a, const T& b) = 0;
};
```

I will indicate the usage of `IsLattice` in the comment section. This part (the 
way defining lattice) is actually quite fluid and more about the programming 
style.

There will also be another class `bfdfa`.

Memorization part:
```cpp

template <typename R, // Return type of the functor that want to be memorized
            typename KType,  // key type of the dictionary
            MemorizingPolicy m, // currently only have policy 'override'
            typename ParentFunctor, // the functor that want to be memorized
            typename ...Args, 
            typename ...InitArgs> // initialization argument to used when 
initializing the dictionary
class RecordingExprFunctor<KType(const Expr& n, Args... args),  // expect  
function that map the input to the KeyType
                          ParentFunctor,
                          m> : public ParentFunctor { ...
using ThisFunctionType = R(const Expr& n, Args... args);
static_assert(std::is_base_of<ExprFunctor<ThisFunctionType>, 
ParentFunctor>::value, "ParentFunctor must be of ExprFunctor Type");


// we propose to have a unique_ptr to the memorizing dictionary (named as "A", 
that record the evaluation of the functor), so that reuse of MemorizedVersion 
will not act on the already filled dictionary (after each memorization, 
std::move it out).

// we will have another dictionary (called "B" here), that map KeyType to the 
input type


R VisitExpr(const Expr& n, Args... args) override;

// Everytime VisitExpr is called with input ‘x’, it will first check if the 
current dict B has the KeyType and if it has and the key is also mapped to the 
"same" input 'x', then we can just query the dict A
// Otherwise we need to evaluate and according to the above MemorizingPolicy 
'override', we will insert(replace) the new(already existent) the evaluated 
value into dict A

// Designed in this twisted way so that if KeyType = AST, InputType=<AST x 
Lattice>, we can capture the final value of the fixpoint computation.

// Of course here if KeyType = InputType, it will become trivial memorization 
and don't need dict B
}

// Similar for StmtFunctor

// Then IRFunctor
template <typename R, 
            typename KType, 
            MemorizingPolicy m,
            typename ParentFunctor,
            typename ...Args, 
            typename ...InitArgs>
class RecordingIRFunctor<KType(const NodeRef& n, Args... args), 
                          ParentFunctor,
                          m> : RecordingExprFunctor<KType(const Expr& n, 
Args... args), ParentFunctor, m>,
                               RecordingStmtFunctor<KType(const Stmt& n, 
Args... args), ParentFunctor, m> {...
}
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4468

[apache/incubator-tvm] [RFC] Data-flow Analysis Functionality on TVM IR (#4468)

Reply via email to