# Problem
When developing program passes on TVM IR (the one once was Halide IR), it is
normal to ask for all sorts of information requiring program analysis, for
example, live variable analysis for dead code elimination. This requirement
becomes urgent when TVM has to directly issue intrinsic and the subsequent
processing stage (for example LLVM) cannot analyze these outputs.
# Consideration
* Program Analysis Functionality must be consistent with the immutable style
of TVM IR/AST
* A Simple Framework is required to eliminate as much boilerplate as possible
when the programmer tries to come up with other types of data-flow analysis
* The interface must be easy to adapt to
We first start with a concrete example -- Live Variable Analysis.
### Live Variable Analysis
I will take Live Variable Analysis as an example. I will define a functor,
**inputting** *an AST* and *the set of live variable* after the AST is
executed, **outputting** *the set of live variable* before the AST is executed.
```cpp
class LiveAnalysisFunctor : public ExprFunctor<set(const Expr&, const set&)>,
public StmtFunctor<set(const Stmt&, const set&)>
```
This functor can be easily defined inductively on the *AST*, demonstrated in
pseudo-code
```haskell
-- the function below taking AST and post-live-variables
LiveAnalysisFunctor (Block{stmtX,stmtY}) postAlive =
let intermediateAlive = LiveAnalysisFunctor stmtY postAlive
in LiveAnalysisFunctor stmtX intermediateAlive
```
In other words, we propagate up the live variable from the end to the
beginning, **according to the execution order of TVM IR**
When encountering *if*-statement, we do similarly
```haskell
LiveAnalysisFunctor (IfStmt{cond,branchThen, branchElse}) postAlive =
let beforeThen = LiveAnalysisFunctor branchThen postAlive
beforeElse = LiveAnalysisFunctor branchElse postAlive
afterConditional = Union(beforeThen, beforeElse)
in LiveAnalysisFunctor cond afterConditional
```
The computation of *for*-statement is the only that requiring fixpoint
computation.
Also, we know at the endpoint, the live variables will be empty.
Several (only relevant to live variable analysis) trivial fact:
* two variables are the same when they are declared at the same place in the
program (since we have scopes in the AST, we also have variables with the same
name)
* We have to distinguish, for each variable/FunctionRef, whether it is def or
use
#### Memorization
You can see that in the current situation, to have the live variable
information at arbitrary point, we need to calculate from very end to the point
(and if that point is inside a for-loop, we need to make sure the fixpoint is
reached). Thus it is suggested we only compute one time and saving the result.
Fortunately, the above functor is pure, thus we can try to cache the above
functor into a standard dictionary/map data structure, and thus *an AST* and
*the set of live variable* as **key**, and *the set of live variable* before
the AST as **value**.
The memorization strategy should be able to apply to any *pure* IRFunctor:
```cpp
template <functor>
class MemorizedVersion<functor> : functor
```
During the computation of the liveness of the very beginning of the program,
the whole program will be visited by the Functor and thus will be cached into
the dictionary.
However, we may only need the information when fixpoint is reached, which means
we can keep forgetting the old value every time a new pair of *AST* and
*PostAlive* is computed while the *AST* has already once inserted into the
dictionary with other *PostAlive'*.
Ultimately, we want to transform the above dictionary into another dictionary
taking *AST* as **key**, and the *live variable before executing AST* and *live
variable after executing AST* as **value**. Of course, this dictionary has a
clear specification that "only when the end of the whole program has empty set
as live variables, each point in the AST has the live variable set as indicated
in the dictionary".
Also note that, it is possible to query everywhere in the program about the
Liveness Infomation, as long as the Functor is properly defined everywhere.
# Proposal
We generalize the above functor into forward/backward data-flow analysis
'framework' to remove boilerplate code as much as possible
```cpp
// forward, functional, DFA
template <class lattice, // the type lattice, also the
prestate/poststate type
class ltProof, // the proof that lattice is a lattice, a subtype of
IsLattice<lattice>
const ltProof& ltOp> // lattice Operations, bind the operation at compile
time
class ffdfa : public ExprFunctor<lattice(const Expr&, const lattice&)>,
public StmtFunctor<lattice(const Stmt&, const lattice&)> {
static_assert(std::is_base_of<IsLattice<lattice>, ltProof>::value, "ltProof
must be a class derived from IsLattice<>");
...
}
template<class T>
class IsLattice { // classical lattice definition, T is lattice under the
following operation
public:
// a least upperbound of two
T join(const T& a, const T& b) = 0;
// a greatest lower bound of two
T meet(const T& a, const T& b) = 0;
// highest
const T& top() = 0;
// whole set
const T& bottom() = 0;
bool equal(const T& a, const T& b) = 0;
};
```
I will indicate the usage of `IsLattice` in the comment section. This part (the
way defining lattice) is actually quite fluid and more about the programming
style.
There will also be another class `bfdfa`.
Memorization part:
```cpp
template <typename R, // Return type of the functor that want to be memorized
typename KType, // key type of the dictionary
MemorizingPolicy m, // currently only have policy 'override'
typename ParentFunctor, // the functor that want to be memorized
typename ...Args,
typename ...InitArgs> // initialization argument to used when
initializing the dictionary
class RecordingExprFunctor<KType(const Expr& n, Args... args), // expect
function that map the input to the KeyType
ParentFunctor,
m> : public ParentFunctor { ...
using ThisFunctionType = R(const Expr& n, Args... args);
static_assert(std::is_base_of<ExprFunctor<ThisFunctionType>,
ParentFunctor>::value, "ParentFunctor must be of ExprFunctor Type");
// we propose to have a unique_ptr to the memorizing dictionary (named as "A",
that record the evaluation of the functor), so that reuse of MemorizedVersion
will not act on the already filled dictionary (after each memorization,
std::move it out).
// we will have another dictionary (called "B" here), that map KeyType to the
input type
R VisitExpr(const Expr& n, Args... args) override;
// Everytime VisitExpr is called with input ‘x’, it will first check if the
current dict B has the KeyType and if it has and the key is also mapped to the
"same" input 'x', then we can just query the dict A
// Otherwise we need to evaluate and according to the above MemorizingPolicy
'override', we will insert(replace) the new(already existent) the evaluated
value into dict A
// Designed in this twisted way so that if KeyType = AST, InputType=<AST x
Lattice>, we can capture the final value of the fixpoint computation.
// Of course here if KeyType = InputType, it will become trivial memorization
and don't need dict B
}
// Similar for StmtFunctor
// Then IRFunctor
template <typename R,
typename KType,
MemorizingPolicy m,
typename ParentFunctor,
typename ...Args,
typename ...InitArgs>
class RecordingIRFunctor<KType(const NodeRef& n, Args... args),
ParentFunctor,
m> : RecordingExprFunctor<KType(const Expr& n,
Args... args), ParentFunctor, m>,
RecordingStmtFunctor<KType(const Stmt& n,
Args... args), ParentFunctor, m> {...
}
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4468