Hi @JosseVanDelm ,

Thanks for the post! Some thoughts:

>Right now a lot of calls to the HWlib are very inefficient, as they require a 
>lot of data reformatting on the RISC-V before being accessible to the 
>accelerator. It is weird/annoying that the data layout already gets specified 
>from Relay, we would probably need to insert a data layout (TIR?) optimization 
>pass along the computation graph at some point there.

and

> Our accelerator supports int8, but also int4 and int2. At some point we will 
> probably need to look into the Bring your own datatype framework, but we also 
> still need to look into quantization support in TVM. Any recommended 
> reference work would be very useful here! 

Tagging @jwfromm in case he knows more here.

> We have looked into using BYOC, but we felt like this was a very direct 
> mapping of Relay to instructions, which bypasses a lot of 
> scheduling/optimization magic (Tensor Expressions, AutoTVM) from the rest of 
> the TVM stack. It also did not seem like a very scalable solution to us, 
> since it seems like we would have to map a lot of Relay instructions directly 
> to a HWLib function call, which we also have to develop ourselves.

Is tensorization an option here, or do you need to do more with the TIR after 
schedule generation?

>We have looked into VTA, but VTA is quite different from our platform. We 
>don’t have a fully fledged workstation host device at hand, apart from the 
>bare metal microcontroller. Also we would like to compile as much as possible 
>statically and AoT, and not in a JIT-fashion. Maybe there are some accelerator 
>specific parts we can reuse though. If someone can share their experience on 
>reusing some of this work that would be very insightful!

This is an area I'm quite interested in, but we haven't done anything on this I 
know of.

> Some functions of the HWlib require parameters that have to be set during 
> compilation based on the weights. It is not clear to us how this fits in with 
> the rest of the compilation stack. Could this be implemented in a TIR pass 
> for example?

It seems like you could have a TIR pass that replaces free variables with 
constants after doing that computation.

Also tagging @tqchen who may have some more ideas of related work here.

Andrew





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/feedback-on-tvm-port-to-custom-accelerator/9548/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/a18f1a7836aa237cbb3c68905f409a2ec3a28353509f15d793919c92faacd96f).

Reply via email to