Hi @Novice ,

Yes, I agree that TVM on Tensor Core GPUs do have a lot of room to optimize. 
Currently we are optimizing the data path between global memory and registers, 
and we think this is a major bottleneck. We are trying to experiment on 
different layout of both  feature maps and weights. We have found that weights 
with 'HWOI' layout, as suggested by @Hzfengsy,  do improve performance for int8 
inference on Tensor Core. 

Thanks,   
Shawn Wu





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-cnns-on-tensor-core/6004/24)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/271066d79ffd97b77186313d866b89de57a9e0dba644118f7e9640edcf5f1d8d).

Reply via email to