I'm stumbled by the same confusion when tracing the FoldConstant optimization 
pass. Some of my debug print shows the process as below. 

I don't know why (**anyone, please explain if you know**) but Relay kicks the 
'whole compilation process' by using the Interpreter (interpreter.cc). 
* 'EtaExpand' Pass is explicitly listed out in FoldConstant its own Sequential 
pass. 
* 'FuseOps' Pass is triggered by ConstEvaluate(expr) function in ConstantFolder 
class. 
* 'InferType' Pass is triggered as a dependent pass from 'FuseOps'.
* Then CompileEngineImpl::JIT(key) is called to do JIT compilation which kicks 
off backend lower_call() to select from TOPI schedule implementations.
* The process goes on with lower level IR passes, such as tir.ThreadSync, 
tir.SplitHostDevice etc. 

According to Relay document, the 'Interpreter' is for 'debug' purpose mainly, 
or a quick and dirty implementation. 

```
transform.cc SequentialNode::operator(), pass name:FoldConstant
transform.cc SequentialNode::operator(), pass name:EtaExpand
transform.cc SequentialNode::operator(), pass name:FuseOps
transform.cc SequentialNode::operator(), resolved dependency pass name:InferType
transform.cc SequentialNode::operator(), pass name:InferType
interpreter.cc VisitExpr_(CallNode*): Invoke() -> calls JIT(key)
CompileEngineImpl::JIT(key)
Inside compile_engine.cc VisitExpr_(CallNode)
 Calling into Python relay.backend.lower_call()
tvm/python/tvm/relay/backend/compile_engine.py, select_implementation(), 
op.name= multiply
  valid implementation  0 :  injective.cpu plevel= 10
  selected best_plevel_implementation:  injective.cpu
Use implementation injective.cpu for op multiply
tvm/python/tvm/relay/backend/_backend.py: lower function:  fused_multiply
lower phase 0
lower phase 1
lower phase 2
lower phase 3
produce T_multiply {
  T_multiply[ramp(0, 1, 16)] = (x16(placeholder[0])*placeholder[ramp(0, 1, 16)])
}

transform.cc SequentialNode::operator(), pass name:_transform
transform.cc SequentialNode::operator(), pass name:tir.ThreadSync
transform.cc SequentialNode::operator(), pass name:tir.ThreadSync
transform.cc SequentialNode::operator(), pass name:tir.InferFragment
transform.cc SequentialNode::operator(), pass name:tir.LowerThreadAllreduce
transform.cc SequentialNode::operator(), pass name:tir.BindDeviceType
transform.cc SequentialNode::operator(), pass name:tir.SplitHostDevice
transform.cc SequentialNode::operator(), pass name:_transform
transform.cc SequentialNode::operator(), pass name:tir.LowerWarpMemory
transform.cc SequentialNode::operator(), pass 
name:tir.LowerDeviceStorageAccessInfo
transform.cc SequentialNode::operator(), pass name:tir.LowerIntrin
transform.cc SequentialNode::operator(), pass name:_transform
transform.cc SequentialNode::operator(), pass name:_transform
transform.cc SequentialNode::operator(), pass name:tir.LowerTVMBuiltin
transform.cc SequentialNode::operator(), pass 
name:tir.LowerDeviceStorageAccessInfo
transform.cc SequentialNode::operator(), pass name:tir.LowerIntrin
transform.cc SequentialNode::operator(), pass name:tir.CombineContextCall
runtime::Module Build(): target.build.llvm
```





---
[Visit 
Topic](https://discuss.tvm.ai/t/why-foldconstant-optimization-needs-schedule-ops/5259/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/6c3589eb326ab766da8da6cd99259f89bc9418bc4d9501ba487a291913206c70).

Reply via email to