Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

Zihao Ye Fri, 07 Oct 2022 05:24:05 -0700

I'm a graduate researcher at UW and have been working as a full-time SDE at AWS 
AI for years, mostly around Deep Learning Frameworks Libraries. I feel like all 
of us agree dynamic shapes are essential so I don't want to spend more time 
emphasizing how important it is. I'm not a contributor to Relax, but I have 
been following it for a long time. I don't want to pretend to be neutral, I do 
think it is quite necessary to welcome Relax, rather than just adding dynamic 
shape support in Relay.


The main controversy in this thread is about whether to upgrade Relay 
incrementally or develop a new IR called Relax. I understand hardware companies 
appreciate stability, and we can see CUDA didn't change its interface 
drastically over the years, what a miracle! There must be several times people 
wanted to develop new languages/compilers for NVIDIA GPUs but CUDA survived, 
this is a lesson we should learn: in the beginning, we design things with a 
vision of the future in mind, then we maintain them with high standard, improve 
it incrementally and be customer-obsessed.

This is the ideal story, but we should not ignore that though CUDA was invented 
before the DL era, there are already many high-performance computing workloads 
the designer can refer to. Fortunately, even in 2022, the operators used in DL 
still highly align with HPC ones and are actually simpler (it's a world of 
GEMM). What about the story of (computational) graph-level IRs? The dominant 
workload in DL changes over time and I would say they cause a lot of headaches 
for framework and compiler designers: first CNNs/RNNs/LSTMs/Tree-LSTMs(the 
structure dynamism is one of the challenges Relay would like to tackle, but 
unfortunately they are used nowhere), then we have Transformers/GNNs(not as hot 
as Transformers because of hardware lottery, but who knows the future). Now we 
have entered a time where models converge, but scalability grows significantly: 
models become larger and larger, and a lot of engineers and researchers propose 
(checkpointing and rematerialization, quantization, graph substitution, fusion 
and stitching, sparsification and mixture-of-experts, hybrid parallelism) to 
optimize DL workloads at compile-time, and I'm glad to see many of them are 
developed upon TVM because TVM's design is always up-to-date and support new 
workloads quickly, however, Relay's current design cannot take full advantage 
of these new techniques, and the system has the trend of becoming fragile. 
Relax is a great opportunity for us to reconsider the graph-level IR design: 
prune the redundancies and add new functionalities, it's exciting to see we can 
unify different levels of optimizations together in [TVM 
Unity](https://github.com/apache/tvm-rfcs/pull/91), once Relax is accepted by 
the community. Refactor makes things simpler, rather than complex.

Whenever we found it's time to make some changes, TVM always embraces new 
designs. This happens several times in TVM history: Prior to Relay, there is 
NNVM, which is deprecated and completely replaced with Relay. The previous 
Tensor-Expression has limited expressiveness, and the schedule tree data 
structure cannot support tensorization elegantly, then we have TensorIR, which 
is not only backward compatible, but also brings opportunities for developing 
new dialects (Ruihang and I designed SparseTIR upon it, works pretty good). The 
AutoTVM cannot generate scheduling templates automatically, then we have Ansor 
and Meta-Scheduler. I would emphasize that **the most important part of all 
these updates are upstreamed within several months**, and do not break any 
backward compatibility issue, it credits to our hard-working and open-minded 
contributors and reviewers. Committing to TVM helps these contributors become 
MLC experts, some of them are PMC members now. I would say non of these 
refactoring influences TVM's reputation, on the contrary, it makes people 
impressed by TVM's speed in adapting to the future, and they are more willing 
to try TVM because it's open, it's driven by innovation.

I really don't understand what's the difference this time, when it comes to 
Relax? We have a bigger community, this is awesome and I definitely welcome 
your input and constructive suggestions on the future of this project. I view 
the [New Scoped Module 
RFC](https://discuss.tvm.apache.org/t/process-rfc-empowering-new-scoped-module-to-the-project/13617)
 as a contract between industrial developers and researchers/engineers like me 
that works on "toy prototypes", we promise not to touch anything that might 
influence user experience, we also don't want to be discouraged because my 
prototypes cannot be upstreamed and only stay in some random GitHub repo as a 
toy. I also think the new S0-S1-S2 progress is already the most painless 
approach to delivering new designs, and the effect is equivalent to 
*incremental change*. If people take a look at the Relax repo, it already has a 
huge amount of code there and well-written documentation (you can compare it 
with the official relay documentation), I think it's super inappropriate to 
ignore these contributors' devotion, especially individual contributors such as 
@LeshengJin . TVM has a huge user base of researchers, they are an important 
part of the community, and they also contribute high-quality code instead of 
just hacking.

Regarding the "lower standard than other communities" issue, TVM has high 
standards and we are not talking about standards. If no fundamental changes are 
allowed in DL infrastructures, google should stay at TF 1.0 and never develop 
JAX, and PyTorch should not create so many different compiler infrastructures 
(I want to share [this 
slide](https://chips-compilers-mlsys-22.github.io/assets/slides/PyTorch%20Compilers%20(Compiler%20&%20Chips%20Symposium%202022).pdf)
 again.

It's 5 am in my timezone, I should have some sleep and I'm still recovering 
from my recent illness. Opinions on my own and I don't speak for any 
groups/organizations.

Best,
Zihao

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1271521751
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/89/c1271521...@github.com>

Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

Reply via email to