Alright! I performed some profiling. I've taken [this](https://github.com/apache/incubator-tvm/blob/master/tutorials/autotvm/tune_relay_x86.py) tutorial as a basis, using only VGG-16.
This is the **output** of the **debug runtime** : ``` Node Name Ops Time(us) Time(%) Shape Inputs Outputs --------- --- -------- ------- ----- ------ ------- fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5 107147.0 12.758 (1, 2, 112, 112, 64) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7 99490.1 11.847 (1, 2, 224, 224, 32) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1 96205.3 11.456 (1, 32, 28, 28, 16) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1 96064.8 11.439 (1, 32, 28, 28, 16) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_31 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3 94304.2 11.229 (1, 8, 56, 56, 32) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3 94182.4 11.215 (1, 8, 56, 56, 32) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6 52424.5 6.242 (1, 2, 112, 112, 64) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4 48494.9 5.775 (1, 8, 56, 56, 32) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2 45365.3 5.402 (1, 16, 28, 28, 32) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu 23580.5 2.808 (1, 32, 14, 14, 16) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu fused_nn_contrib_conv2d_NCHWc_add_nn_relu 23557.3 2.805 (1, 32, 14, 14, 16) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu2 fused_nn_contrib_conv2d_NCHWc_add_nn_relu 23549.4 2.804 (1, 32, 14, 14, 16) 3 1 fused_nn_dense_add_nn_relu_1 fused_nn_dense_add_nn_relu_1 19578.8 2.331 (1, 4096) 3 1 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8 fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8 5776.86 0.688 (1, 2, 224, 224, 32) 3 1 fused_nn_dense_add_nn_relu fused_nn_dense_add_nn_relu 3206.76 0.382 (1, 4096) 3 1 fused_layout_transform_19 fused_layout_transform_19 2011.35 0.24 (1, 4, 224, 224, 16) 1 1 fused_nn_max_pool2d_4 fused_nn_max_pool2d_4 964.965 0.115 (1, 2, 112, 112, 32) 1 1 fused_nn_dense_add fused_nn_dense_add 784.874 0.093 (1, 1000) 3 1 fused_layout_transform_171 fused_layout_transform_17 561.36 0.067 (1, 1, 56, 56, 256) 1 1 fused_layout_transform_17 fused_layout_transform_17 559.474 0.067 (1, 1, 56, 56, 256) 1 1 fused_nn_max_pool2d_3 fused_nn_max_pool2d_3 495.34 0.059 (1, 2, 56, 56, 64) 1 1 fused_layout_transform_15 fused_layout_transform_15 289.293 0.034 (1, 1, 28, 28, 512) 1 1 fused_layout_transform_14 fused_layout_transform_14 232.466 0.028 (1, 1, 28, 28, 512) 1 1 fused_nn_max_pool2d_2 fused_nn_max_pool2d_2 221.099 0.026 (1, 8, 28, 28, 32) 1 1 fused_layout_transform_nn_batch_flatten fused_layout_transform_nn_batch_flatten 179.002 0.021 (1, 25088) 1 1 fused_layout_transform_16 fused_layout_transform_16 135.138 0.016 (1, 1, 28, 28, 256) 1 1 fused_nn_max_pool2d_1 fused_nn_max_pool2d_1 109.964 0.013 (1, 32, 14, 14, 16) 1 1 fused_layout_transform_18 fused_layout_transform_18 101.463 0.012 (1, 4, 56, 56, 32) 1 1 fused_layout_transform_20 fused_layout_transform_20 66.265 0.008 (1, 1, 224, 224, 3) 1 1 fused_layout_transform_13 fused_layout_transform_13 46.61 0.006 (1, 1, 14, 14, 512) 1 1 fused_layout_transform_131 fused_layout_transform_13 45.106 0.005 (1, 1, 14, 14, 512) 1 1 fused_layout_transform_132 fused_layout_transform_13 38.617 0.005 (1, 1, 14, 14, 512) 1 1 fused_nn_max_pool2d fused_nn_max_pool2d 24.039 0.003 (1, 32, 7, 7, 16) 1 1 fused_nn_softmax fused_nn_softmax 16.067 0.002 (1, 1000) 1 1 Total_time - 839810.613 - - - - ``` I also printed out the **LLVM IR** for main: ``` fn (%data: Tensor[(1, 3, 224, 224), float32]) -> Tensor[(1, 1000), float32] { %0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */ /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */; %1 = nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(64), float32] */ /* ty=Tensor[(64), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */; %2 = nn.relu(%1) /* ty=Tensor[(1, 64, 224, 224), float32] */; %3 = nn.conv2d(%2, meta[relay.Constant][2] /* ty=Tensor[(64, 64, 3, 3), float32] */ /* ty=Tensor[(64, 64, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */; %4 = nn.bias_add(%3, meta[relay.Constant][3] /* ty=Tensor[(64), float32] */ /* ty=Tensor[(64), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */; %5 = nn.relu(%4) /* ty=Tensor[(1, 64, 224, 224), float32] */; %6 = nn.max_pool2d(%5, pool_size=[2, 2], strides=[2, 2], layout="NCHW32c") /* ty=Tensor[(1, 64, 112, 112), float32] */; %7 = nn.conv2d(%6, meta[relay.Constant][4] /* ty=Tensor[(128, 64, 3, 3), float32] */ /* ty=Tensor[(128, 64, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 112, 112), float32] */; %8 = nn.bias_add(%7, meta[relay.Constant][5] /* ty=Tensor[(128), float32] */ /* ty=Tensor[(128), float32] */) /* ty=Tensor[(1, 128, 112, 112), float32] */;would you be able to you see how these can be matched? %9 = nn.relu(%8) /* ty=Tensor[(1, 128, 112, 112), float32] */; %10 = nn.conv2d(%9, meta[relay.Constant][6] /* ty=Tensor[(128, 128, 3, 3), float32] */ /* ty=Tensor[(128, 128, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=128, kernel_size=[3, 3]) /* ty=Tensor[(1, 128, 112, 112), float32] */; %11 = nn.bias_add(%10, meta[relay.Constant][7] /* ty=Tensor[(128), float32] */ /* ty=Tensor[(128), float32] */) /* ty=Tensor[(1, 128, 112, 112), float32] */; %12 = nn.relu(%11) /* ty=Tensor[(1, 128, 112, 112), float32] */; %13 = nn.max_pool2d(%12, pool_size=[2, 2], strides=[2, 2], layout="NCHW64c") /* ty=Tensor[(1, 128, 56, 56), float32] */; %14 = nn.conv2d(%13, meta[relay.Constant][8] /* ty=Tensor[(256, 128, 3, 3), float32] */ /* ty=Tensor[(256, 128, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 56, 56), float32] */; %15 = nn.bias_add(%14, meta[relay.Constant][9] /* ty=Tensor[(256), float32] */ /* ty=Tensor[(256), float32] */) /* ty=Tensor[(1, 256, 56, 56), float32] */; %16 = nn.relu(%15) /* ty=Tensor[(1, 256, 56, 56), float32] */; %17 = nn.conv2d(%16, meta[relay.Constant][10] /* ty=Tensor[(256, 256, 3, 3), float32] */ /* ty=Tensor[(256, 256, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 56, 56), float32] */; %18 = nn.bias_add(%17, meta[relay.Constant][11] /* ty=Tensor[(256), float32] */ /* ty=Tensor[(256), float32] */) /* ty=Tensor[(1, 256, 56, 56), float32] */; %19 = nn.relu(%18) /* ty=Tensor[(1, 256, 56, 56), float32] */; %20 = nn.conv2d(%19, meta[relay.Constant][12] /* ty=Tensor[(256, 256, 3, 3), float32] */ /* ty=Tensor[(256, 256, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=256, kernel_size=[3, 3]) /* ty=Tensor[(1, 256, 56, 56), float32] */; %21 = nn.bias_add(%20, meta[relay.Constant][13] /* ty=Tensor[(256), float32] */ /* ty=Tensor[(256), float32] */) /* ty=Tensor[(1, 256, 56, 56), float32] */; %22 = nn.relu(%21) /* ty=Tensor[(1, 256, 56, 56), float32] */; %23 = nn.max_pool2d(%22, pool_size=[2, 2], strides=[2, 2], layout="NCHW32c") /* ty=Tensor[(1, 256, 28, 28), float32] */; %24 = nn.conv2d(%23, meta[relay.Constant][14] /* ty=Tensor[(512, 256, 3, 3), float32] */ /* ty=Tensor[(512, 256, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 28, 28), float32] */; %25 = nn.bias_add(%24, meta[relay.Constant][15] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 28, 28), float32] */; %26 = nn.relu(%25) /* ty=Tensor[(1, 512, 28, 28), float32] */; %27 = nn.conv2d(%26, meta[relay.Constant][16] /* ty=Tensor[(512, 512, 3, 3), float32] */ /* ty=Tensor[(512, 512, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 28, 28), float32] */; %28 = nn.bias_add(%27, meta[relay.Constant][17] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 28, 28), float32] */; %29 = nn.relu(%28) /* ty=Tensor[(1, 512, 28, 28), float32] */; %30 = nn.conv2d(%29, meta[relay.Constant][18] /* ty=Tensor[(512, 512, 3, 3), float32] */ /* ty=Tensor[(512, 512, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 28, 28), float32] */; %31 = nn.bias_add(%30, meta[relay.Constant][19] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 28, 28), float32] */; %32 = nn.relu(%31) /* ty=Tensor[(1, 512, 28, 28), float32] */; %33 = nn.max_pool2d(%32, pool_size=[2, 2], strides=[2, 2], layout="NCHW16c") /* ty=Tensor[(1, 512, 14, 14), float32] */; %34 = nn.conv2d(%33, meta[relay.Constant][20] /* ty=Tensor[(512, 512, 3, 3), float32] */ /* ty=Tensor[(512, 512, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 14, 14), float32] */; %35 = nn.bias_add(%34, meta[relay.Constant][21] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 14, 14), float32] */; %36 = nn.relu(%35) /* ty=Tensor[(1, 512, 14, 14), float32] */; %37 = nn.conv2d(%36, meta[relay.Constant][22] /* ty=Tensor[(512, 512, 3, 3), float32] */ /* ty=Tensor[(512, 512, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 14, 14), float32] */; %38 = nn.bias_add(%37, meta[relay.Constant][23] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 14, 14), float32] */; %39 = nn.relu(%38) /* ty=Tensor[(1, 512, 14, 14), float32] */; %40 = nn.conv2d(%39, meta[relay.Constant][24] /* ty=Tensor[(512, 512, 3, 3), float32] */ /* ty=Tensor[(512, 512, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) /* ty=Tensor[(1, 512, 14, 14), float32] */; %41 = nn.bias_add(%40, meta[relay.Constant][25] /* ty=Tensor[(512), float32] */ /* ty=Tensor[(512), float32] */) /* ty=Tensor[(1, 512, 14, 14), float32] */; %42 = nn.relu(%41) /* ty=Tensor[(1, 512, 14, 14), float32] */; %43 = nn.max_pool2d(%42, pool_size=[2, 2], strides=[2, 2], layout="NCHW16c") /* ty=Tensor[(1, 512, 7, 7), float32] */; %44 = nn.batch_flatten(%43) /* ty=Tensor[(1, 25088), float32] */; %45 = nn.dense(%44, meta[relay.Constant][26] /* ty=Tensor[(4096, 25088), float32] */ /* ty=Tensor[(4096, 25088), float32] */, units=4096) /* ty=Tensor[(1, 4096), float32] */; %46 = nn.bias_add(%45, meta[relay.Constant][27] /* ty=Tensor[(4096), float32] */ /* ty=Tensor[(4096), float32] */, axis=-1) /* ty=Tensor[(1, 4096), float32] */; %47 = nn.relu(%46) /* ty=Tensor[(1, 4096), float32] */; %48 = nn.dropout(%47) /* ty=(Tensor[(1, 4096), float32], Tensor[(1, 4096), float32]) */; %49 = %48.0; %50 = nn.dense(%49, meta[relay.Constant][28] /* ty=Tensor[(4096, 4096), float32] */ /* ty=Tensor[(4096, 4096), float32] */, units=4096) /* ty=Tensor[(1, 4096), float32] */; %51 = nn.bias_add(%50, meta[relay.Constant][29] /* ty=Tensor[(4096), float32] */ /* ty=Tensor[(4096), float32] */, axis=-1) /* ty=Tensor[(1, 4096), float32] */; %52 = nn.relu(%51) /* ty=Tensor[(1, 4096), float32] */; %53 = nn.dropout(%52) /* ty=(Tensor[(1, 4096), float32], Tensor[(1, 4096), float32]) */; %54 = %53.0; %55 = nn.dense(%54, meta[relay.Constant][30] /* ty=Tensor[(1000, 4096), float32] */ /* ty=Tensor[(1000, 4096), float32] */, units=1000) /* ty=Tensor[(1, 1000), float32] */; %56 = nn.bias_add(%55, meta[relay.Constant][31] /* ty=Tensor[(1000), float32] */ /* ty=Tensor[(1000), float32] */, axis=-1) /* ty=Tensor[(1, 1000), float32] */; nn.softmax(%56) /* ty=Tensor[(1, 1000), float32] */ } ``` And lastly, I iterate through the **tasks** gathered by **AutoTVM** and print their properties: ``` [Kernel 0/11] dense_nopack.x86 - Search space size: 208 Output previous layer : ('TENSOR', (1, 4096), 'float32') Input current layer : ('TENSOR', (1000, 4096), 'float32') [Kernel 1/11] dense_nopack.x86 - Search space size: 169 Output previous layer : ('TENSOR', (1, 4096), 'float32') Input current layer : ('TENSOR', (4096, 4096), 'float32') [Kernel 2/11] dense_nopack.x86 - Search space size: 390 Output previous layer : ('TENSOR', (1, 25088), 'float32') Input current layer : ('TENSOR', (4096, 25088), 'float32') [Kernel 3/11] conv2d_NCHWc.x86 - Search space size: 800 Output previous layer : ('TENSOR', (1, 512, 14, 14), 'float32') Input current layer : ('TENSOR', (512, 512, 3, 3), 'float32') [Kernel 4/11] conv2d_NCHWc.x86 - Search space size: 1200 Output previous layer : ('TENSOR', (1, 512, 28, 28), 'float32') Input current layer : ('TENSOR', (512, 512, 3, 3), 'float32') [Kernel 5/11] conv2d_NCHWc.x86 - Search space size: 1080 Output previous layer : ('TENSOR', (1, 256, 28, 28), 'float32') Input current layer : ('TENSOR', (512, 256, 3, 3), 'float32') [Kernel 6/11] conv2d_NCHWc.x86 - Search space size: 1296 Output previous layer : ('TENSOR', (1, 256, 56, 56), 'float32') Input current layer : ('TENSOR', (256, 256, 3, 3), 'float32') [Kernel 7/11] conv2d_NCHWc.x86 - Search space size: 1152 Output previous layer : ('TENSOR', (1, 128, 56, 56), 'float32') Input current layer : ('TENSOR', (256, 128, 3, 3), 'float32') [Kernel 8/11] conv2d_NCHWc.x86 - Search space size: 1152 Output previous layer : ('TENSOR', (1, 128, 112, 112), 'float32') Input current layer : ('TENSOR', (128, 128, 3, 3), 'float32') [Kernel 9/11] conv2d_NCHWc.x86 - Search space size: 1008 Output previous layer : ('TENSOR', (1, 64, 112, 112), 'float32') Input current layer : ('TENSOR', (128, 64, 3, 3), 'float32') [Kernel 10/11] conv2d_NCHWc.x86 - Search space size: 980 Output previous layer : ('TENSOR', (1, 64, 224, 224), 'float32') Input current layer : ('TENSOR', (64, 64, 3, 3), 'float32') [Kernel 11/11] conv2d_NCHWc.x86 - Search space size: 280 Output previous layer : ('TENSOR', (1, 3, 224, 224), 'float32') Input current layer : ('TENSOR', (64, 3, 3, 3), 'float32') ``` Matching the AutoTVM task to the LLVM IR is relatively straight forward. For example: ``` [Kernel 0/11] dense_nopack.x86 - Search space size: 208 Output previous layer : ('TENSOR', (1, 4096), 'float32') Input current layer : ('TENSOR', (1000, 4096), 'float32') .. matches ... %55 = nn.dense(%54, meta[relay.Constant][30] /* ty=Tensor[(1000, 4096), float32] */ /* ty=Tensor[(1000, 4096), float32] */, units=1000) /* ty=Tensor[(1, 1000), float32] */; %56 = nn.bias_add(%55, meta[relay.Constant][31] /* ty=Tensor[(1000), float32] */ /* ty=Tensor[(1000), float32] */, axis=-1) /* ty=Tensor[(1, 1000), float32] */; nn.softmax(%56) /* ty=Tensor[(1, 1000), float32] */ ``` But I can't really associate these to the shapes of the layers output by the debug runtime. With dense layers: it is eventually possible. With conv2d:s I really don't see it.... Maybe @comaniac do you see any way how to match these? I'd really appreciate your help! Thank you & Best regards, Robert --- [Visit Topic](https://discuss.tvm.ai/t/autotvm-selective-tuning-of-hotspots/6083/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/d9ec1ad379c9430ed2f494ddf04ba0f654a308eb6fa4f47947b2c4d7e9326e8c).