[TVM Discuss] [Development/RFC] [Discuss] Running graphs when ops aren't supported

jonso via TVM Discuss Thu, 24 Oct 2019 15:26:21 -0700


**Problem**

In order for TVM to work as a generic compiler for frontend frameworks, it
needs to support all of the operators that those frontend frameworks support.
Relay already supports many operators for numerous frontends, and this operator
list tends to be sufficient for a handful of use-cases and common model
architectures. However, this solution doesn't generalize to new model
architectures or model developers who use unique, specialty operators provided
by the framework. This puts us in a constant state of playing catch-up.

As a more concrete example, I was recently investigating an NLP model that uses
TensorFlow's lookup table for embedding lookup, and TensorFlow's
contrib.seq2seq.GatherTree operator as part of beam search. Given that none of
these operators are supported in Relay, I started to look into implementation.
However, I found it difficult to justify putting in the effort to implement an
operator in Relay+TOPI based on a one-off, potentially esoteric use case.
Further, ops such as the lookup table should be very fast in TensorFlow, and
there really isn't a need to use TVM to compile it.

**Proposed Solution**

I think that unsupported operators should not prevent us from running a graph.
When an operator is not supported, we can &quot;fallback&quot; and run this
operator in the original framework's runtime. This hybrid approach will
certainly not be as performant as using the TVM runtime for the entire graph,
but will unblock users for running graphs with new model architectures with
brand new operators.

As I mentioned above, NLP models are a great example of this. As many people
decide to implement their embedding lookups differently, we cannot be certain
that all of those ops will be supported. However, the core model logic (such as
Transformer or RNN) is generally supported by TVM. A hybrid approach will allow
us to run the embedding lookup in the native framework, and use TVM to optimize
the core model, which also tends to be more computationally expensive.

**Proposed Implementation**

I propose creating a new operator in TVM that will run a generic graph or
subgraph in the native frontend framework.

Let's look at an example for TensorFlow:

When we see an operator that is not in the convert map, we can create a Relay
node for a new op called `TensorFlowRunner`. Given that TensorFlow can execute
subgraphs by simply passing input / output nodes into `session.run`, this
operator needs to take in: input tensor names, output tensor name, and the
serialized graph definition that was being used in the TF frontend (this can be
a string attribute). All other parameters and attributes can be inferred from
the graph definition.

This operator will be implemented as a TOPI contrib library. The first time
this operator is executed, it will JIT create the session from the graph
definition and cache it. It will then call `session.run` given the input tensor
names and output tensor name, returning the output tensor. All subsequent calls
to this operator will use the cached session. In fact, any call to the
`TensorFlowRunner` operator in the context of a graph execution can use the
same session, since `session.run` can be called with different arguments.

This feature will be opt-in, as TVM will need to be linked to the frontend
runtime. We can also add a parameter like `fallback_when_op_not_supported` to
the `from_tensorflow` method.

I had thought of other implementations - such as using a custom [TVM op in
TensorFlow](http://tracking.discuss.tvm.ai/tracking/click?d=c-y4zbbsPrPRIDZl9ISGHYfUCwenPYpWx3blBnuAuAFd0ItC-OUWsCN_kP4IAZPWNPjCBbPMcuMd1RFGSZxT4e3cnB4MAI6XovuYIXYEnJy9s1OjYgKuX627wlCaUWhhh9Xn5YdsNfHtiVQ0CE2cn_DMeRqhWnO0oinENTaP1rAW__J4qp8CKn8DfzwLHLwf1g2)
and manually splicing the graph and running subgraphs in their respective
frameworks. The first solution is challenging in that it requires the user to
have the model source code. I believe the correct solution should work even
when we only have the exported graph. The second solution is challenging
because it requires manually splicing the graph, converting spliced nodes to
explicit inputs and outputs, and handling nodes that &quot;pass-through&quot;
between subgraphs when they are not inputs or outputs.

I'm looking forward to hearing what you think!

cc @tqchen @jroesch @jwfromm

---
[Visit
Topic](http://tracking.discuss.tvm.ai/tracking/click?d=eHqzPj8_yPoKc3fIfpmwPMxsdMETLytsN2FZkXHXKl1J3SwcTzS5rzg6_6oPCbPkkmsBeWTEzNyaypH03Q3vHb_xeszKdPF1PtzTytML1EGXpRpGTKhOA1d5Isb2TXeqwxOoXmDXI0jXFXZ0r5fof92LHRNNNt6rp8NOR5IpzQV3ZREjwB_QhO1JVXqAVHqASg2)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](http://tracking.discuss.tvm.ai/tracking/click?d=7cFgOaAA4XIBVlVKt_oyC07uihTjg4Q6cjeBRNRTiPrG3MbGTNsHPnIXUnAC8DPUsgqei6a99ivb2603JOj1DQoHj6zlSF7IxinkaEsIswR-_foEpodoWiStw8DphN5-eFSxdNK5sNOMUZhm_Dc5dVPmWX5k3q6vreMp9B4e8jyHGQUyauYUB0GCm3yqwWHQzEBRZd-GGPmIPPXK-4F6a1oN30W28Lsqb1eyzfyoDNG80).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=vh5l6wrGuqBdT514O27D7A2

[TVM Discuss] [Development/RFC] [Discuss] Running graphs when ops aren't supported

Reply via email to