Hi everyone!   
I modified this 
sample(https://tvm.apache.org/docs/tutorials/frontend/from_pytorch.html)  to 
add desired_layout NHWC to the network saved from pytorch(which uses NCHW):
```python
 desired_layouts = {'qnn.conv2d': ['NHWC', 'HWIO'],
                     'nn.conv2d': ['NHWC', 'HWIO']
                     }
 # RemoveUnunsedFunctions is used to clean up the graph.
 seq = tvm.transform.Sequential([relay.transform.RemoveUnusedFunctions(),
                                 relay.transform.ConvertLayout(desired_layouts)]
                                 )
 with tvm.transform.PassContext(opt_level=3):
     mod = seq(mod)
 print(mod)
```

The dump of mod is expected: both input/ouput and each layer's weight comes 
with a layout_tranform, for example:
```txt
  %0 = layout_transform(%input0, src_layout="NCHW", dst_layout="NHWC") /* 
ty=Tensor[(1, 224, 224, 3), float32] */;
  %1 = layout_transform(%conv1.weight, src_layout="OIHW", dst_layout="HWIO") /* 
ty=Tensor[(7, 7, 3, 64), float32] */;
  %2 = nn.conv2d(%0, %1, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, 
kernel_size=[7, 7], data_layout="NHWC", kernel_layout="HWIO") /* ty=Tensor[(1, 
112, 112, 64), float32] */;
  %3 = nn.batch_norm(%2, %bn1.weight, %bn1.bias, %bn1.running_mean, 
%bn1.running_var, axis=3) /* ty=(Tensor[(1, 112, 112, 64), float32], 
Tensor[(64), float32], Tensor[(64), float32]) */;
  %4 = %3.0;
  %5 = nn.relu(%4) /* ty=Tensor[(1, 112, 112, 64), float32] */;
  %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1, 1, 
1], layout="NHWC") /* ty=Tensor[(1, 56, 56, 64), float32] */;
  %7 = layout_transform(%layer1.0.conv1.weight, src_layout="OIHW", 
dst_layout="HWIO") /* ty=Tensor[(3, 3, 64, 64), float32] */;
  %8 = nn.conv2d(%6, %7, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3], 
data_layout="NHWC", kernel_layout="HWIO") /* ty=Tensor[(1, 56, 56, 64), 
float32] */;
```

However, I checked the gpu trace, there are only two layout transform 
```text
[CUDA memcpy HtoD]
**fused_layout_transform_11_kernel0** [513]
fused_nn_conv2d_add_nn_relu_7_kernel0 [517]
fused_nn_max_pool2d_kernel0 [521]
fused_nn_conv2d_add_nn_relu_6_kernel0 [525]
fused_nn_conv2d_add_add_nn_relu_3_kernel0 [529]
fused_nn_conv2d_add_nn_relu_6_kernel0 [532]
fused_nn_conv2d_add_add_nn_relu_3_kernel0 [535]
fused_nn_conv2d_add_nn_relu_5_kernel0 [539]
fused_nn_conv2d_add_kernel0 [543]
fused_nn_conv2d_add_add_nn_relu_2_kernel0 [547]
fused_nn_conv2d_add_nn_relu_4_kernel0 [551]
fused_nn_conv2d_add_add_nn_relu_2_kernel0 [554]
fused_nn_conv2d_add_nn_relu_3_kernel0 [558]
fused_nn_conv2d_add_1_kernel0 [562]
fused_nn_conv2d_add_add_nn_relu_1_kernel0 [566]
fused_nn_conv2d_add_nn_relu_2_kernel0 [570]
fused_nn_conv2d_add_add_nn_relu_1_kernel0 [573]
fused_nn_conv2d_add_nn_relu_1_kernel0 [577]
fused_nn_conv2d_add_2_kernel0 [581]
fused_nn_conv2d_add_add_nn_relu_kernel0 [585]
fused_nn_conv2d_add_nn_relu_kernel0 [589]
fused_nn_conv2d_add_add_nn_relu_kernel0 [592]
fused_nn_adaptive_avg_pool2d_kernel0 [596]
**fused_layout_transform_reshape_squeeze_kernel0** [600]
fused_nn_dense_add_kernel0 [604]
[CUDA memcpy DtoH]
```

I'm quite confused here, does this mean all of these kernels support NHWC as 
input while using OIHW filter parameters? Or does TVM transform these weight 
parameters in advance? Since there is no need to transform filters more than 
once.   

PS: I'm working on loading a pytroch model(which in NCHW by default) into TVM 
and running it in NHWC format only(including input/output/each conv layer), so 
I expect there should be none layout_transform at all. Am I right?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/how-dose-tvm-elimitate-calls-of-conv-weights-layout-transform/8208/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d14f2e354dc25859193127825b83d19f4eae27be67fb05201137b953c9c10ef2).

Reply via email to