Hi @Novice ,
Yes, I agree that TVM on Tensor Core GPUs do have a lot of room to optimize. Currently we are optimizing the data path between global memory and registers, and we think this is a major bottleneck. We are trying to experiment on different layout of both feature maps and weights. We have found that weights with 'HWOI' layout, as suggested by @Hzfengsy, do improve performance for int8 inference on Tensor Core. Thanks, Shawn Wu --- [Visit Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-cnns-on-tensor-core/6004/24) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/271066d79ffd97b77186313d866b89de57a9e0dba644118f7e9640edcf5f1d8d).