For winograd impl with large batch size, maybe you can refer to this paper 
https://dl.acm.org/doi/pdf/10.1145/3332466.3374520.   
They implement an assembler for Volta/Turing architecture and use CHWN layout 
for large batch winograd algorithm.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-winograd-conv2d-on-tensor-core/6543/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/d73b5caf14d5a945d28d9b4922c2922565a0c36ac38619ab1cbd46dbf0000055).

Reply via email to