For Intel x86 target, firstly, we should read the doc : https://tvm.apache.org/docs/tutorials/optimize/opt_gemm.html, which covers important aspects of tvm schedule primitives and its effect. Secondly, recommend to reading https://tvm.apache.org/docs/tutorials/autotvm/tune_simple_template.html, which tells us how to combine auto tvm and schedule to improve performance. Thirdly, we could enable Intel VTune to analyze what is the bottleneck of our program (LOAD occupies too much time or something else). Fourthly, we could refer some good libraries to learn what they do to improve performance, for example Intel oneDNN. Then we could try to implement the same mechanism using tvm (even tensorize). These are my experiences and suggestions.
--- [Visit Topic](https://discuss.tvm.ai/t/how-to-further-improve-the-performance-of-given-schedule/7711/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/7694af142977158874bfacf4721039ae6b151b74282e0dab2dbeaefd18bf53fb).