Hi, @Hzfengsy @Shawn_Inspur :slightly_smiling_face:
Thanks for your efforts on supporing TensorCore on TVM.

I have tuned TensorCore on classical network such as resnet50 & vgg16(32 
batch_size). And the tensor_precision_fu_utilization reported by Nvprof shows 
that I got a Mid/Low utilization on TensorCore:
```
   Kernel: fused_nn_conv2d_add_nn_relu_2_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Mid (4)     Mid (4)     Mid (4)          
   Kernel: fused_nn_softmax_kernel3                                             
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization    Idle (0)    Idle (0)    Idle (0)          
   Kernel: fused_nn_conv2d_add_nn_relu_3_kernel0                                
                                                               
         4           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Mid (4)     Mid (4)     Mid (4)          
   Kernel: fused_nn_conv2d_add_nn_relu_4_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Mid (4)     Mid (4)     Mid (4)          
   Kernel: fused_nn_batch_flatten_kernel0                                       
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization    Idle (0)    Idle (0)    Idle (0)          
   Kernel: fused_nn_conv2d_add_nn_relu_5_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Mid (4)     Mid (4)     Mid (4)          
   Kernel: fused_nn_conv2d_add_nn_relu_6_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Mid (4)     Mid (4)     Mid (4)          
   Kernel: fused_nn_dense_add_kernel0                                           
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Low (2)     Low (2)     Low (2)          
   Kernel: fused_nn_conv2d_add_nn_relu_7_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization     Low (3)     Low (3)     Low (3)          
   Kernel: fused_nn_conv2d_add_nn_relu_8_kernel0                                
                                                               
         2           tensor_precision_fu_utilization   Tensor-Precision 
Function Unit Utilization    Idle (0)    Idle (0)    Idle (0)          
   Kernel: fused_nn_conv2d_add_nn_relu_kernel0                                  
                                                               
``` 

But when I use cudnn as backend, the utilization is always High.

It seems like that there is still a lot of room for further optimization.Do you 
have any idea on how to get higher utiliazation for tensor core?





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-cnns-on-tensor-core/6004/23)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/49ae790d50854fc6c8127386575b0fb95b596c5f3757dded89786ce8e5373c40).

Reply via email to