Hi everyone! I modified this sample(https://tvm.apache.org/docs/tutorials/frontend/from_pytorch.html) to add desired_layout NHWC to the network saved from pytorch(which uses NCHW): ```python desired_layouts = {'qnn.conv2d': ['NHWC', 'HWIO'], 'nn.conv2d': ['NHWC', 'HWIO'] } # RemoveUnunsedFunctions is used to clean up the graph. seq = tvm.transform.Sequential([relay.transform.RemoveUnusedFunctions(), relay.transform.ConvertLayout(desired_layouts)] ) with tvm.transform.PassContext(opt_level=3): mod = seq(mod) print(mod) ```
The dump of mod is expected: both input/ouput and each layer's weight comes with a layout_tranform, for example: ```txt %0 = layout_transform(%input0, src_layout="NCHW", dst_layout="NHWC") /* ty=Tensor[(1, 224, 224, 3), float32] */; %1 = layout_transform(%conv1.weight, src_layout="OIHW", dst_layout="HWIO") /* ty=Tensor[(7, 7, 3, 64), float32] */; %2 = nn.conv2d(%0, %1, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7], data_layout="NHWC", kernel_layout="HWIO") /* ty=Tensor[(1, 112, 112, 64), float32] */; %3 = nn.batch_norm(%2, %bn1.weight, %bn1.bias, %bn1.running_mean, %bn1.running_var, axis=3) /* ty=(Tensor[(1, 112, 112, 64), float32], Tensor[(64), float32], Tensor[(64), float32]) */; %4 = %3.0; %5 = nn.relu(%4) /* ty=Tensor[(1, 112, 112, 64), float32] */; %6 = nn.max_pool2d(%5, pool_size=[3, 3], strides=[2, 2], padding=[1, 1, 1, 1], layout="NHWC") /* ty=Tensor[(1, 56, 56, 64), float32] */; %7 = layout_transform(%layer1.0.conv1.weight, src_layout="OIHW", dst_layout="HWIO") /* ty=Tensor[(3, 3, 64, 64), float32] */; %8 = nn.conv2d(%6, %7, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO") /* ty=Tensor[(1, 56, 56, 64), float32] */; ``` However, I checked the gpu trace, there are only two layout transform ```text [CUDA memcpy HtoD] **fused_layout_transform_11_kernel0** [513] fused_nn_conv2d_add_nn_relu_7_kernel0 [517] fused_nn_max_pool2d_kernel0 [521] fused_nn_conv2d_add_nn_relu_6_kernel0 [525] fused_nn_conv2d_add_add_nn_relu_3_kernel0 [529] fused_nn_conv2d_add_nn_relu_6_kernel0 [532] fused_nn_conv2d_add_add_nn_relu_3_kernel0 [535] fused_nn_conv2d_add_nn_relu_5_kernel0 [539] fused_nn_conv2d_add_kernel0 [543] fused_nn_conv2d_add_add_nn_relu_2_kernel0 [547] fused_nn_conv2d_add_nn_relu_4_kernel0 [551] fused_nn_conv2d_add_add_nn_relu_2_kernel0 [554] fused_nn_conv2d_add_nn_relu_3_kernel0 [558] fused_nn_conv2d_add_1_kernel0 [562] fused_nn_conv2d_add_add_nn_relu_1_kernel0 [566] fused_nn_conv2d_add_nn_relu_2_kernel0 [570] fused_nn_conv2d_add_add_nn_relu_1_kernel0 [573] fused_nn_conv2d_add_nn_relu_1_kernel0 [577] fused_nn_conv2d_add_2_kernel0 [581] fused_nn_conv2d_add_add_nn_relu_kernel0 [585] fused_nn_conv2d_add_nn_relu_kernel0 [589] fused_nn_conv2d_add_add_nn_relu_kernel0 [592] fused_nn_adaptive_avg_pool2d_kernel0 [596] **fused_layout_transform_reshape_squeeze_kernel0** [600] fused_nn_dense_add_kernel0 [604] [CUDA memcpy DtoH] ``` I'm quite confused here, does this mean all of these kernels support NHWC as input while using OIHW filter parameters? Or does TVM transform these weight parameters in advance? Since there is no need to transform filters more than once. PS: I'm working on loading a pytroch model(which in NCHW by default) into TVM and running it in NHWC format only(including input/output/each conv layer), so I expect there should be none layout_transform at all. Am I right? --- [Visit Topic](https://discuss.tvm.apache.org/t/how-dose-tvm-elimitate-calls-of-conv-weights-layout-transform/8208/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d14f2e354dc25859193127825b83d19f4eae27be67fb05201137b953c9c10ef2).