@kovasb Nice to see your interest into our TVM&TF NMT article:)

Also we have had some internal discussions regarding to adding non-TF DL 
compiler backend into TF as a complementary for XLA, and TVM is absolutely one 
of the great choices.

There are some principles I think we might need to follow to ensure the smooth 
integration:
1. TVM related support should be placed as a standalone github repository to 
ensure the loose coupling between TF and TVM;
2. The concrete method to achieve this loose coupling is to leverage TF's graph 
optimization registration mechanism, which will be invoked at TF runtime.
3. A new graph pass can be added based on TF graph optimization framework(just 
the same as TF XLA's MarkForCompilation, EncapsulateSubGraph, BuildXLALaunchOp) 
which can recognize some portions of the TF graph which we think might benefit 
from TVM backend and then cluster these TF operations into a 
TF2TVMCompilation(or some other name) sub-graph and finally replace those 
clustered ops with a TF2TVMBridgeOp macro op.
4. During the initial run of TF2TVMBridgeOp, compile the underlying TF ops into 
backend executables through TVM infrastructure, to ensure the smoothness of 
compilation phase, an extra IR  layer may be necessary in addition to TVM's own 
IR architecture, this should be open for design discussion.
5. After the initial run of TF2TVMBridgeOp, for the following runs, the 
compiled executable can be directly invoked. Another round of compilation may 
be necessary when the input data shape of the TF2TVMBridgeOp changes(although 
TVM provides native support for dynamic shape, we may wish to tease performance 
boundary through static shape information)
6. The initial scenario I personally think TVM can complement TF and XLA is its 
native supporting mechanism for compute-intensive operations, such as 
GEMM/Conv, which might be a good starting point. For non-compute-intensive 
operations(such as add/mul/reduce/transpose, etc.), I think XLA currently 
already provides good mechanism support, and we could follow XLA's 
infrastructure to optimize these non-compute-intensive operations directly.

There are some scenarios we estimated to be suitable for this feature and 
already started the design and refine work. If you have any interests, it would 
be highly appreciated to provide your concrete use case or jump into the 
design&discussion directly. 

Thanks

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3059#issuecomment-485226269

Reply via email to