@FrozenGene @tqchen Thanks for your advices.
I have written my own schedule and autotvm template. I also tried Intel OneDNN 
according to the BYOC tutorial. Currently I have outperformed OneDNN by about 
0.4 ms on single cpu core.

I tried to perf it with VTune and collected hotspots report.
![image|690x314](upload://k0KC0YOjn9H53lk6CZSmYaifdPf.jpeg) 

The schedule performed unrolling and tiling a lot. It seemed that there were no 
obvious bottleneck.
1. Do you have any further suggestions about VTune ?
2. `__dlopen` appears on top. Why ?





---
[Visit 
Topic](https://discuss.tvm.ai/t/how-to-further-improve-the-performance-of-given-schedule/7711/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/379a0eaad1f1c4050e318bd7ef673dc89cc9c5f1994baec09d0b0a880cfcbed6).

Reply via email to