Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

Ligeng Zhu Wed, 09 Nov 2022 15:41:55 -0800

I learn a lot from reading through the thread,  and find most people here are 
from a system background: either doing related research in schools or heading 
an engineering team in companies. I would like to share some of my thoughts 
from a different perspective, as a **TVM user** and **ML algorithm developer**.


I am a graduate student at MIT and studying efficient deep learning algorithms 
and co-designs (details in [my page](http://lzhu.me/), [lab 
site](https://tinyml.mit.edu/) and  [our recent project that trains NN on a 
256kB MCU](https://tinytraining.mit.edu/)). We have been honest TVM users 
because of its flexibility, high performance and open-source. But, when we want 
to dive deeper and make some customizations, things are becoming complex and 
relay is no longer friendly 

* **Unnecessary long call stack between python and cpp**: Take `relay.build` as 
an example, a relay graph (in python) first does shape check (in cpp), then 
calls to wrapper (python), later feeds into TensorExpression (either in python 
or cpp), and then feed into VM for compilation (packed functions). ANY step in 
the middle can raise errors and developers can easily get lost in the pipeline. 
Actually you can find a lot of users reporting similar issues on the forum and 
only very few of them can fortunately get an answer from experienced developers.
* **Difficult to add a new operator because of complex pipeline**: In our 
research, and also many other users development, adding new operators is a 
common request. But in  current relay, even if we just want to add a simple 
Identity operator (y = x), we need to 
  1. declare an attribute node.
  2. write type relation check in CPP.
  3. register OP in CPP.
  4. describe the compute.
  5. describe the schedule.
  6. wrap up with CPP.
  7. wrap up with python.
  Seven steps just to define an identity function? Seriously? In PyTorch it 
won't cost more than 20 lines. This significantly slows the growth of TVM 
community and if you check the [PR 
history](https://github.com/apache/tvm/commits/main/python/tvm/relay/op), the 
numbers of new operators and new contributors are quite limited this year, 
while PyTorch receives new operator implementations from the community every 
day.  
* **Missing capability to call third-party implementations**: Relay syntax does 
not, at least not easily, support users from call 3rd party backend like CuDNN, 
OpenVino, TensorRT. For the cloud, CuDNN and TensorRT are still SoTA for most 
benchmarks and without simple integration means inferior performance, which 
will make fewer people choose TVM. For the edge, the situation is even more 
serious because of hardware diversity. Take Qualcomm DSP as an example: even 
though the TVM hexagon support is in progress, but the best solution is still 
those manually written kernels in 
[SNPE](https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html).
 It is not trivial to call other backends in current relay: BYOC is difficult 
to use and register custom operators can be quite complex as discussed in last 
point.  

I understand those who want the backward compatibility so existing projects are 
not broken. But we cannot build a ship of Theseus in the real world and the 
above issues cannot be easily "improved" with current relay. If TVM do not 
embrace new designs and improve its user-friendliness, then, eventually 
developers will switch to other tools and this is indeed happening: 
* [Oneflow uses MLIR to rewrite their compiler 
pass](https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion)
 to accelerate diffusion models by 4x compared with pytorch and 1.6x compared 
with TensorRT.
* [Megvii adapts MLIR to minimize runtime 
build](https://github.com/MegEngine/MegCC) to generate YoloX binary with just 
95kB.
* [PyTorch proposes TorchDynamo to speedup 
training](https://github.com/pytorch/torchdynamo/) and achieves average 1.34x 
speedup over previous NVFuser. 
* ... 

I like the TVM project and hope the community can be always active. TVM has a 
huge user base of researchers and Relax can allow them to easily contribute 
their code and idea to the repo, instead of tricky hacking and creating 
separate repos for each project. This is important for an open-source community 
-- just recall how mxnet loses its market and why PyTorch can beat TensorFlow 
even released one year later. TVM should consider Relax's upstreaming given its 
more thoughtful and user-friendly design, well-written documentation/tutorials, 
and S0,1,2 painless upgrading.

I would like to discuss more if there is any comments and questions.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1309546688
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/89/c1309546...@github.com>

Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

Reply via email to