All we need is **a target backend which can emit and optimize intrinsic ir**.
Let's take a look at what we've done in akg, which is a tensor compiler for
Davinci core based on tvm.

**Why we do this?**
1) NPU has more SIMD intrinsics
Yeah, it is unfriendly for Ansor. However, I think it is not contradict. We
could not expect we could generate asm like ACL, but we could expect we could
achieve the same optimization. For example, your example is we can not do
`register blocking` optimization easily, but we could expect we ha
The parser and printer is ready on mainline and supports manipulating either
TensorIR or low-level TIR. @vinx13 is playing with it right now on GPU codegen.
The TensorIR lowering process is not yet fully on mainline, but let's expect it
very soon. @Hzfengsy is almost ready to submit a PR.
As
@junrushao1994 Yeah I see, but seems we're not yet able to lower & build a TIR
module in the master branch now? :laughing:
(Maybe I can have a try on the tensorir private branch...)
@FrozenGene I agree.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codeg
When we want to do some advanced optimization like `register blocking` the goal
you want to achieve , TVM codegen can not handle it very well. My experience is
1. write micro gemm like `4x4` or `8x8` and then tensorize 2. try, try and try
different schedule and find one combination to match yo
I think our frontend tutorials are the closest to what we can call "model zoo".
I agree that having more collection of ready-to-run models, preferably with
auto-tuned config, would be valuable.
Recently I looked at [the model zoo in
openvino](https://github.com/openvinotoolkit/open_model_zoo
We have similar observation that LLVM is unable to produce what we exactly want
when it comes to very low-level control (e.g. registers, pipeline depth, etc).
A way to obtain fine-grained control is to embed TVM intrinsics that could be
lowered to ASM.
BTW, if you would like to play around wi