Hi xiaocenxiaocen,
Thanks. I will follow up this paper.
Best wishes,
Shawn Wu
---
[Visit
Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-winograd-conv2d-on-tensor-core/6543/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from
Hi @Novice ,
Yes, I agree that TVM on Tensor Core GPUs do have a lot of room to optimize.
Currently we are optimizing the data path between global memory and registers,
and we think this is a major bottleneck. We are trying to experiment on
different layout of both feature maps and weights.
**Introduction**
We optimized the Winograd algorithm of conv2d for Tensor Core with NHWC layout.
There are four modules in winograd algorithm: feature map transform, kernel
transform, inverse transform, and batched gemm (bgemm).
Following major functions were added:
1, Conv2d_nhwc_winograd_t
We are pleased to share the codes. Please check the PR: [TOPI][Tensor Core]
Conv2d and Dense ops support on Tensor Core #5099. Try to find the code in
topi/python/topi/cuda/conv2d_nhwc.py, which is the code that has the same
layout as conv2d of Tensor Core.
For any questions, please feel fre