I'm stumbled by the same confusion when tracing the FoldConstant optimization pass. Some of my debug print shows the process as below.
I don't know why (**anyone, please explain if you know**) but Relay kicks the 'whole compilation process' by using the Interpreter (interpreter.cc). * 'EtaExpand' Pass is explicitly listed out in FoldConstant its own Sequential pass. * 'FuseOps' Pass is triggered by ConstEvaluate(expr) function in ConstantFolder class. * 'InferType' Pass is triggered as a dependent pass from 'FuseOps'. * Then CompileEngineImpl::JIT(key) is called to do JIT compilation which kicks off backend lower_call() to select from TOPI schedule implementations. * The process goes on with lower level IR passes, such as tir.ThreadSync, tir.SplitHostDevice etc. According to Relay document, the 'Interpreter' is for 'debug' purpose mainly, or a quick and dirty implementation. ``` transform.cc SequentialNode::operator(), pass name:FoldConstant transform.cc SequentialNode::operator(), pass name:EtaExpand transform.cc SequentialNode::operator(), pass name:FuseOps transform.cc SequentialNode::operator(), resolved dependency pass name:InferType transform.cc SequentialNode::operator(), pass name:InferType interpreter.cc VisitExpr_(CallNode*): Invoke() -> calls JIT(key) CompileEngineImpl::JIT(key) Inside compile_engine.cc VisitExpr_(CallNode) Calling into Python relay.backend.lower_call() tvm/python/tvm/relay/backend/compile_engine.py, select_implementation(), op.name= multiply valid implementation 0 : injective.cpu plevel= 10 selected best_plevel_implementation: injective.cpu Use implementation injective.cpu for op multiply tvm/python/tvm/relay/backend/_backend.py: lower function: fused_multiply lower phase 0 lower phase 1 lower phase 2 lower phase 3 produce T_multiply { T_multiply[ramp(0, 1, 16)] = (x16(placeholder[0])*placeholder[ramp(0, 1, 16)]) } transform.cc SequentialNode::operator(), pass name:_transform transform.cc SequentialNode::operator(), pass name:tir.ThreadSync transform.cc SequentialNode::operator(), pass name:tir.ThreadSync transform.cc SequentialNode::operator(), pass name:tir.InferFragment transform.cc SequentialNode::operator(), pass name:tir.LowerThreadAllreduce transform.cc SequentialNode::operator(), pass name:tir.BindDeviceType transform.cc SequentialNode::operator(), pass name:tir.SplitHostDevice transform.cc SequentialNode::operator(), pass name:_transform transform.cc SequentialNode::operator(), pass name:tir.LowerWarpMemory transform.cc SequentialNode::operator(), pass name:tir.LowerDeviceStorageAccessInfo transform.cc SequentialNode::operator(), pass name:tir.LowerIntrin transform.cc SequentialNode::operator(), pass name:_transform transform.cc SequentialNode::operator(), pass name:_transform transform.cc SequentialNode::operator(), pass name:tir.LowerTVMBuiltin transform.cc SequentialNode::operator(), pass name:tir.LowerDeviceStorageAccessInfo transform.cc SequentialNode::operator(), pass name:tir.LowerIntrin transform.cc SequentialNode::operator(), pass name:tir.CombineContextCall runtime::Module Build(): target.build.llvm ``` --- [Visit Topic](https://discuss.tvm.ai/t/why-foldconstant-optimization-needs-schedule-ops/5259/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/6c3589eb326ab766da8da6cd99259f89bc9418bc4d9501ba487a291913206c70).