[Apache TVM Discuss] [Questions] Feedback on TVM port to custom accelerator

Josse Van Delm via Apache TVM Discuss Tue, 30 Mar 2021 05:39:07 -0700


Thanks for your reply @areusch !

[quote="areusch, post:2, topic:9548"]
Is tensorization an option here, or do you need to do more with the TIR after
schedule generation?
[/quote]
Yes, i'm currently trying to use tensorization to map entire convolutions and
data preparation steps (data layout, padding) to a HWLib function call, but the
process hasn't been particularly smooth for such coarse computations i'm afraid.
[Getting data to be transformed from TVM seems
suboptimal.](https://discuss.tvm.apache.org/t/te-using-reshape-without-copy/9480?u=jossevandelm)
Also creating large tensorization intrinsics is tricky;
Right now for example it looks like I would have to generate a separate TIR
pass, because I can not merge
e.g.`Relu(Conv(Pad(ChgDataLayout(input)),filter))` into one intrinsic;
[tensorize/tir does not allow for creating an intrinsic with nested
computations](https://discuss.tvm.apache.org/t/tensorize-how-to-use-tensorize-for-composition-op-eg-conv-relu/2336?u=jossevandelm)
The TIR pass i'm envisioning could detect those sequential operations and
maybe merge them into one as a workaround for this problem.

I'm not sure how to write a TIR pass yet, but what I would like to do in the
future is to maybe skip some data layout transformations automatically.
Now the data has to be transformed every time it is sent from and to the
accelerator just because most standard convolutions expect NCHW to work in
relay for example. We should not be doing data layout transformations if two
consecutive operations are performed on the accelerator.
I'm not sure if it would be best to implement this as a Relay pass or a TIR
pass.
If anyone can confirm that this is possible or can send me some work on this
that would be very great, as i've not had time to look into creating my own
pass.

At some point I'd also like to include some autotuning in the grand scheme of
things (probably not with actual timing measurements, but rather with a
representative model).
But I haven't had time to look into this, and how much effort it would take me
to implement this.
I'm also afraid the gains of autotuning with coarse tensorization might be
quite minimal. But maybe there might be some gains possible for the RISC-V
scheduling, i'm not sure.

[quote="areusch, post:2, topic:9548"]
It seems like you could have a TIR pass that replaces free variables with
constants after doing that computation.
[/quote]

Okay I'll be sure to look into this!

Also thank you very much for including other people in the discussion!

---
[Visit
Topic](https://discuss.tvm.apache.org/t/feedback-on-tvm-port-to-custom-accelerator/9548/3)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/d35556e8f52c910c8d2fd2a269273b37c1c84093f88a5eca14384f67c991dce0).

[Apache TVM Discuss] [Questions] Feedback on TVM port to custom accelerator

Reply via email to