[Apache TVM Discuss] [Questions] Do we have any way to process codegen with more fine grade control?

Zhao Wu via Apache TVM Discuss Thu, 06 May 2021 19:30:09 -0700


Yeah, it is unfriendly for Ansor. However, I think it is not contradict. We 
could not expect we could generate asm like ACL, but we could expect we could 
achieve the same optimization. For example, your example is we can not do 
`register blocking` optimization easily, but we could expect we have done `FMA` 
optimization like ACL, so we generate `fmla` correctly too. For the CPU part, 
in my opinion, even we can not generate the same asm snippet, but we maybe 
could get the same level of performance if we could generate key instruction 
like `fmla`. If we can not, there must be one factor we ignore, maybe memory 
access unfriendly so that we have high rate of cache miss or what else.


back to ansor, we of course should improve our ansor's performance, however, 
for the most performance gemm micro part, I think the most practical way in the 
current time, is we should leverage micro gemm kernel (4x4/8x8) and let ansor 
or metaschedule to schedule other part (like tiling parameter / unroll / 
parallel or what else)





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codegen-with-more-fine-grade-control/9908/7)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8df7cf6538086479a1f6a0a7c82725d74ed35fba9b1abb148392751b9fcb1448).

[Apache TVM Discuss] [Questions] Do we have any way to process codegen with more fine grade control?

Reply via email to