**Problem**

In order for TVM to work as a generic compiler for frontend frameworks, it 
needs to support all of the operators that those frontend frameworks support. 
Relay already supports many operators for numerous frontends, and this operator 
list tends to be sufficient for a handful of use-cases and common model 
architectures. However, this solution doesn't generalize to new model 
architectures or model developers who use unique, specialty operators provided 
by the framework. This puts us in a constant state of playing catch-up.

As a more concrete example, I was recently investigating an NLP model that uses 
TensorFlow's lookup table for embedding lookup, and TensorFlow's 
contrib.seq2seq.GatherTree operator as part of beam search. Given that none of 
these operators are supported in Relay, I started to look into implementation. 
However, I found it difficult to justify putting in the effort to implement an 
operator in Relay+TOPI based on a one-off, potentially esoteric use case. 
Further, ops such as the lookup table should be very fast in TensorFlow, and 
there really isn't a need to use TVM to compile it.

**Proposed Solution**

I think that unsupported operators should not prevent us from running a graph. 
When an operator is not supported, we can "fallback" and run this 
operator in the original framework's runtime. This hybrid approach will 
certainly not be as performant as using the TVM runtime for the entire graph, 
but will unblock users for running graphs with new model architectures with 
brand new operators.

As I mentioned above, NLP models are a great example of this. As many people 
decide to implement their embedding lookups differently, we cannot be certain 
that all of those ops will be supported. However, the core model logic (such as 
Transformer or RNN) is generally supported by TVM. A hybrid approach will allow 
us to run the embedding lookup in the native framework, and use TVM to optimize 
the core model, which also tends to be more computationally expensive.

**Proposed Implementation**

I propose creating a new operator in TVM that will run a generic graph or 
subgraph in the native frontend framework.

Let's look at an example for TensorFlow:

When we see an operator that is not in the convert map, we can create a Relay 
node for a new op called `TensorFlowRunner`. Given that TensorFlow can execute 
subgraphs by simply passing input / output nodes into `session.run`, this 
operator needs to take in: input tensor names, output tensor name, and the 
serialized graph definition that was being used in the TF frontend (this can be 
a string attribute). All other parameters and attributes can be inferred from 
the graph definition.

This operator will be implemented as a TOPI contrib library. The first time 
this operator is executed, it will JIT create the session from the graph 
definition and cache it. It will then call `session.run` given the input tensor 
names and output tensor name, returning the output tensor. All subsequent calls 
to this operator will use the cached session. In fact, any call to the 
`TensorFlowRunner` operator in the context of a graph execution can use the 
same session, since `session.run` can be called with different arguments.

This feature will be opt-in, as TVM will need to be linked to the frontend 
runtime. We can also add a parameter like `fallback_when_op_not_supported` to 
the `from_tensorflow` method.

I had thought of other implementations - such as using a custom [TVM op in 
TensorFlow](http://tracking.discuss.tvm.ai/tracking/click?d=c-y4zbbsPrPRIDZl9ISGHYfUCwenPYpWx3blBnuAuAFd0ItC-OUWsCN_kP4IAZPWNPjCBbPMcuMd1RFGSZxT4e3cnB4MAI6XovuYIXYEnJy9s1OjYgKuX627wlCaUWhhh9Xn5YdsNfHtiVQ0CE2cn_DMeRqhWnO0oinENTaP1rAW__J4qp8CKn8DfzwLHLwf1g2)
 and manually splicing the graph and running subgraphs in their respective 
frameworks. The first solution is challenging in that it requires the user to 
have the model source code. I believe the correct solution should work even 
when we only have the exported graph. The second solution is challenging 
because it requires manually splicing the graph, converting spliced nodes to 
explicit inputs and outputs, and handling nodes that "pass-through" 
between subgraphs when they are not inputs or outputs.

I'm looking forward to hearing what you think!

cc @tqchen @jroesch @jwfromm





---
[Visit 
Topic](http://tracking.discuss.tvm.ai/tracking/click?d=eHqzPj8_yPoKc3fIfpmwPMxsdMETLytsN2FZkXHXKl1J3SwcTzS5rzg6_6oPCbPkkmsBeWTEzNyaypH03Q3vHb_xeszKdPF1PtzTytML1EGXpRpGTKhOA1d5Isb2TXeqwxOoXmDXI0jXFXZ0r5fof92LHRNNNt6rp8NOR5IpzQV3ZREjwB_QhO1JVXqAVHqASg2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](http://tracking.discuss.tvm.ai/tracking/click?d=7cFgOaAA4XIBVlVKt_oyC07uihTjg4Q6cjeBRNRTiPrG3MbGTNsHPnIXUnAC8DPUsgqei6a99ivb2603JOj1DQoHj6zlSF7IxinkaEsIswR-_foEpodoWiStw8DphN5-eFSxdNK5sNOMUZhm_Dc5dVPmWX5k3q6vreMp9B4e8jyHGQUyauYUB0GCm3yqwWHQzEBRZd-GGPmIPPXK-4F6a1oN30W28Lsqb1eyzfyoDNG80).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=vh5l6wrGuqBdT514O27D7A2

Reply via email to