Hi @tkonolige 
Sorry for the delay in this response.
I modified the target to "llvm -mcpu=cascadelake" according to the target and 
re-did the tuning. Now I get a much better inference time of < 100ms on 
benchmark and VirtualMachineProfiler, but a 4x discrepancy still remains 
between the output of the two profilers.
The outputs are attached below ([1]).
I tried ResNet-18 as well, but I am observing the same discrepancy there as 
well.

On running without graph tuning, I am observing almost no discrepancy. 
Interestingly the debug_executor's total inference time worsens when I enable 
graph tuning, while that of the other two improves. The outputs are attached 
below ([2]).
I haven't yet been able to get hold of another system to install and run these 
experiments on, I will update this thread as soon as that happens.
__________________________________________________________________________________

Outputs:
[1] With Graph Tuning
(a) profiler_vm
```
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', 
(1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest 
context. A fallback configuration is used, which may bring great performance 
regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 
2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A 
fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Name                                                    Duration (us)  Percent  
Count  out_layout  Device  data_layout  kernel_layout              Hash   
layout                                                                          
                                                                             
Argument Shapes  dst_layout  weight_layout  src_layout  
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                 15,909.52    16.00  
    6     NCHW16c    cpu0      NCHW16c     OIHW16i16o  efb9044cdd43e0b8         
                                                       float32[1, 16, 14, 14, 
16], float32[16, 16, 3, 3, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 
14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                 10,522.82    10.58  
    4     NCHW32c    cpu0       NCHW8c      OIHW8i32o  0d551fd3800939e1         
                                                            float32[1, 16, 28, 
28, 8], float32[4, 16, 3, 3, 8, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 28, 
28, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                 9,095.54     9.15  
    3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  68695c5cd347ce57         
                                                           float32[1, 32, 7, 7, 
16], float32[32, 32, 3, 3, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 
7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  8,034.25     8.08  
    3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  83e0f5d1673ff2ae         
                                                            float32[1, 1, 56, 
56, 64], float32[2, 1, 3, 3, 64, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 
56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  6,451.60     6.49  
    5     NCHW16c    cpu0      NCHW16c     OIHW16i16o  c8d2fb74508242fa         
                                                       float32[1, 64, 14, 14, 
16], float32[16, 64, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 
14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_2                          6,219.45     6.25  
    5     NCHW16c    cpu0       NCHW4c      OIHW4i16o  991e77362efe315d         
                                                       float32[1, 64, 14, 14, 
4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 
14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_1                          4,069.38     4.09  
    3     NCHW64c    cpu0      NCHW32c     OIHW32i64o  b8f45dade76ef8ee         
                                                          float32[1, 4, 28, 28, 
32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 28, 
28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                  3,627.03     3.65  
    3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  435cfe42fcb8d0b0         
                                                            float32[1, 8, 28, 
28, 64], float32[4, 8, 1, 1, 64, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 
28, 28, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add                            3,069.46     3.09  
    2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6fb734c77ed64bde         
                                                       float32[1, 2, 56, 56, 
32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 
56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    2,898.89     2.92  
    1     NCHW16c    cpu0       NCHW3c      OIHW3i16o  10a40e9231ff15a6         
                                                          float32[1, 1, 224, 
224, 3], float32[4, 1, 7, 7, 3, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 
112, 112, 16]                                         
fused_nn_contrib_conv2d_NCHWc_3                              2,659.84     2.67  
    1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  9c3ea371f8ec4054         
                                                                                
 float32[1, 64, 14, 14, 16], float32[128, 64, 1, 1, 16, 16], float32[1, 128, 7, 
7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 2,592.41     2.61  
    2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  1cc8a4dccc794a64         
                                                         float32[1, 128, 7, 7, 
16], float32[32, 128, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 
7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_3                          2,587.97     2.60  
    2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  528b9cb523882d7e         
                                                        float32[1, 16, 7, 7, 
32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 
7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_1                              2,568.47     2.58  
    1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  9b9c1d5fc56b0353         
                                                                                
   float32[1, 16, 56, 56, 16], float32[8, 16, 1, 1, 16, 64], float32[1, 8, 28, 
28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_2                              2,560.30     2.57  
    1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  371a9e61ecaeecce         
                                                                                
   float32[1, 8, 28, 28, 64], float32[64, 8, 1, 1, 64, 16], float32[1, 64, 14, 
14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                  2,393.13     2.41  
    2     NCHW64c    cpu0      NCHW16c     OIHW16i64o  850ecaa157c95aac         
                                                          float32[1, 16, 56, 
56, 16], float32[1, 16, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 
56, 56, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                1,519.12     1.53  
    1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  abe40a1f08b34bad         
                             float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 
32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 
56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc                                1,382.10     1.39  
    1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  7661eb48c0b8a7e6         
                                                                                
   float32[1, 4, 56, 56, 16], float32[16, 4, 1, 1, 16, 16], float32[1, 16, 56, 
56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              1,319.25     1.33  
    1     NCHW64c    cpu0      NCHW32c     OIHW32i64o  88bbb32f8f542f98         
                                 float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 
32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 
28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              1,299.49     1.31  
    1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  c7b912640028a9e2         
                             float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 
4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 
14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       1,252.95     1.26  
    1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  21cb6d538731ba92         
  float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 
7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 
7, 7, 16]                                         
fused_add_nn_relu                                              823.04     0.83  
    2                cpu0                              e907ce81104cda7a         
                                                                                
      float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 56, 
56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                   759.67     0.76  
    1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737         
                                                         float32[1, 64, 14, 14, 
16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 
7, 16]                                         
fused_nn_contrib_dense_pack_add                                710.21     0.71  
    1                cpu0                              7641a0cce9852143         
                                                                                
           float32[1, 2048], float32[40, 2048, 25], float32[1, 1000], 
float32[1, 1000]                      NC25n              
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                    693.32     0.70  
    1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  dc31662fedbb8185         
                                                         float32[1, 8, 28, 28, 
64], float32[16, 8, 1, 1, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 
14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                    656.53     0.66  
    1    NCHW128c    cpu0      NCHW16c    OIHW16i128o  9b01f6479b89fd68         
                                                       float32[1, 16, 56, 56, 
16], float32[1, 16, 1, 1, 16, 128], float32[1, 1, 1, 1, 128], float32[1, 1, 28, 
28, 128]                                         
fused_add_nn_relu_1                                            631.05     0.63  
    3                cpu0                              0e82013d73aa68c1         
                                                                                
         float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 
28, 64]                                         
fused_add_nn_relu_2                                            542.71     0.55  
    5                cpu0                              f12067172f61c850         
                                                                                
      float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 
14, 16]                                         
fused_nn_max_pool2d_add_nn_relu                                364.59     0.37  
    1                cpu0                              6f701a4fa071030f  
NCHW16c                                                                         
              float32[1, 4, 112, 112, 16], float32[1, 4, 1, 1, 16], float32[1, 
4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                    330.20     0.33  
    1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  0f7bbb0e363c360c         
                                                            float32[1, 4, 56, 
56, 16], float32[1, 4, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 
56, 56, 64]                                         
fused_layout_transform_1                                       188.19     0.19  
    3                cpu0                              b8cbb72b4035894d         
                                                                                
                                  float32[1, 4, 28, 28, 32], float32[1, 16, 28, 
28, 8]      NCHW8c                    NCHW32c  
fused_layout_transform_2                                       172.13     0.17  
    6                cpu0                              f5e631fb93d23d4d         
                                                                                
                                 float32[1, 16, 14, 14, 16], float32[1, 64, 14, 
14, 4]      NCHW4c                    NCHW16c  
fused_add_nn_relu_3                                            106.41     0.11  
    2                cpu0                              5d16c15878cc73d4         
                                                                                
       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 
7, 16]                                         
fused_add_layout_transform                                      96.21     0.10  
    1                cpu0                              69355d3cc810f874         
                                                                                
                 float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 
224, 3]      NCHW3c                       NCHW  
fused_nn_global_avg_pool2d                                      56.33     0.06  
    1                cpu0                              f18307e2786f4cb3  
NCHW16c                                                                         
                                         float32[1, 128, 7, 7, 16], float32[1, 
128, 1, 1, 16]                                         
fused_layout_transform                                          52.16     0.05  
    1                cpu0                              2c5d64d5f9faa001         
                                                                                
                                 float32[1, 1, 28, 28, 128], float32[1, 16, 28, 
28, 8]      NCHW8c                   NCHW128c  
fused_layout_transform_3                                        48.26     0.05  
    3                cpu0                              add43c0d2d8a8a3c         
                                                                                
                                    float32[1, 32, 7, 7, 16], float32[1, 16, 7, 
7, 32]     NCHW32c                    NCHW16c  
fused_nn_softmax                                                 9.76     0.01  
    1                cpu0                              ca61e79ea24e53f0         
                                                                                
                                                    float32[1, 1000], 
float32[1, 1000]                                         
fused_layout_transform_nn_batch_flatten                          1.73     0.00  
    1                cpu0                              2db99463d18696a4         
                                                                                
                                           float32[1, 128, 1, 1, 16], 
float32[1, 2048]        NCHW                    NCHW16c  
----------                                                                      
                                                                                
                                                                                
                                                                                
                                               
Sum                                                         98,275.48    98.83  
   84                                                                           
                                                                                
                                                                                
                                               
Total                                                       99,441.43           
    1                cpu0                                                       
                                                   
```

(b) debug_executor
```
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', 
(1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest 
context. A fallback configuration is used, which may bring great performance 
regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 
2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A 
fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Name                                                                   Duration 
(us)  Percent  Count  out_layout  Device  data_layout  kernel_layout            
  Hash   layout                                                                 
                                                                                
      Argument Shapes  dst_layout  weight_layout  src_layout  
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3                       
1,39,559.24    36.43      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
6fb734c77ed64bde                                                                
float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 
16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12              
1,18,024.98    30.81      1     NCHW16c    cpu0       NCHW3c      OIHW3i16o  
10a40e9231ff15a6                                                                
   float32[1, 1, 224, 224, 3], float32[4, 1, 7, 7, 3, 16], float32[1, 4, 1, 1, 
16], float32[1, 4, 112, 112, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc                               
23,051.66     6.02      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
7661eb48c0b8a7e6                                                                
                            float32[1, 4, 56, 56, 16], float32[16, 4, 1, 1, 16, 
16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                 
15,185.61     3.96      6     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
efb9044cdd43e0b8                                                                
float32[1, 16, 14, 14, 16], float32[16, 16, 3, 3, 16, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                 
13,328.36     3.48      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  
83e0f5d1673ff2ae                                                                
     float32[1, 1, 56, 56, 64], float32[2, 1, 3, 3, 64, 32], float32[1, 2, 1, 
1, 32], float32[1, 2, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                
13,159.49     3.44      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  
0f7bbb0e363c360c                                                                
     float32[1, 4, 56, 56, 16], float32[1, 4, 1, 1, 16, 64], float32[1, 1, 1, 
1, 64], float32[1, 1, 56, 56, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                 
10,205.32     2.66      4     NCHW32c    cpu0       NCHW8c      OIHW8i32o  
0d551fd3800939e1                                                                
     float32[1, 16, 28, 28, 8], float32[4, 16, 3, 3, 8, 32], float32[1, 4, 1, 
1, 32], float32[1, 4, 28, 28, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    
7,727.92     2.02      3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
68695c5cd347ce57                                                                
    float32[1, 32, 7, 7, 16], float32[32, 32, 3, 3, 16, 16], float32[1, 32, 1, 
1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1                          
5,840.79     1.52      5     NCHW16c    cpu0       NCHW4c      OIHW4i16o  
991e77362efe315d                                                                
float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 
16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                  
5,746.35     1.50      5     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
c8d2fb74508242fa                                                                
float32[1, 64, 14, 14, 16], float32[16, 64, 1, 1, 16, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2                          
3,745.35     0.98      3     NCHW64c    cpu0      NCHW32c     OIHW32i64o  
b8f45dade76ef8ee                                                                
   float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 
28, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                  
3,425.00     0.89      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  
435cfe42fcb8d0b0                                                                
     float32[1, 8, 28, 28, 64], float32[4, 8, 1, 1, 64, 32], float32[1, 4, 1, 
1, 32], float32[1, 4, 28, 28, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2                              
2,508.48     0.65      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  
371a9e61ecaeecce                                                                
                            float32[1, 8, 28, 28, 64], float32[64, 8, 1, 1, 64, 
16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                 
2,400.83     0.63      2     NCHW64c    cpu0      NCHW16c     OIHW16i64o  
850ecaa157c95aac                                                                
   float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 64], float32[1, 1, 1, 
1, 64], float32[1, 1, 56, 56, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1                              
2,396.47     0.63      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  
9b9c1d5fc56b0353                                                                
                            float32[1, 16, 56, 56, 16], float32[8, 16, 1, 1, 
16, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                  
2,271.00     0.59      2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
1cc8a4dccc794a64                                                                
  float32[1, 128, 7, 7, 16], float32[32, 128, 1, 1, 16, 16], float32[1, 32, 1, 
1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add                            
2,260.06     0.59      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
528b9cb523882d7e                                                                
 float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 
7, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3                              
2,240.88     0.59      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
9c3ea371f8ec4054                                                                
                          float32[1, 64, 14, 14, 16], float32[128, 64, 1, 1, 
16, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              
1,401.25     0.37      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
abe40a1f08b34bad                                      float32[1, 2, 56, 56, 
32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 
1, 1, 16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              
1,249.88     0.33      1     NCHW64c    cpu0      NCHW32c     OIHW32i64o  
88bbb32f8f542f98                                          float32[1, 4, 28, 28, 
32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 1, 
1, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       
1,220.86     0.32      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
21cb6d538731ba92           float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 
16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 
1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                
1,160.16     0.30      1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  
c7b912640028a9e2                                      float32[1, 64, 14, 14, 
4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 
1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                    
599.86     0.16      1    NCHW128c    cpu0      NCHW16c    OIHW16i128o  
9b01f6479b89fd68                                                                
float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 128], float32[1, 1, 1, 1, 
128], float32[1, 1, 28, 28, 128]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                    
579.10     0.15      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  
dc31662fedbb8185                                                                
  float32[1, 8, 28, 28, 64], float32[16, 8, 1, 1, 64, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                    
571.03     0.15      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
8d07031ff51d0737                                                                
  float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 
1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_add_nn_relu_3                                            
519.35     0.14      2                cpu0                              
e907ce81104cda7a                                                                
                               float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_dense_pack_add                                
488.16     0.13      1                cpu0                              
7641a0cce9852143                                                                
                                    float32[1, 2048], float32[40, 2048, 25], 
float32[1, 1000], float32[1, 1000]                      NC25n              
tvmgen_default_fused_add_nn_relu_2                                            
360.30     0.09      3                cpu0                              
0e82013d73aa68c1                                                                
                                  float32[1, 8, 28, 28, 64], float32[1, 8, 1, 
1, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_max_pool2d_add_nn_relu                                
342.65     0.09      1                cpu0                              
6f701a4fa071030f  NCHW16c                                                       
                                float32[1, 4, 112, 112, 16], float32[1, 4, 1, 
1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_add_nn_relu_1                                            
291.22     0.08      5                cpu0                              
f12067172f61c850                                                                
                               float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 
16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_layout_transform_2                                       
106.25     0.03      3                cpu0                              
b8cbb72b4035894d                                                                
                                                           float32[1, 4, 28, 
28, 32], float32[1, 16, 28, 28, 8]      NCHW8c                    NCHW32c  
tvmgen_default_fused_add_layout_transform                                      
76.45     0.02      1                cpu0                              
69355d3cc810f874                                                                
                                          float32[1, 3, 224, 224], float32[3, 
1, 1], float32[1, 1, 224, 224, 3]      NCHW3c                       NCHW  
tvmgen_default_fused_layout_transform_1                                        
68.45     0.02      6                cpu0                              
f5e631fb93d23d4d                                                                
                                                          float32[1, 16, 14, 
14, 16], float32[1, 64, 14, 14, 4]      NCHW4c                    NCHW16c  
tvmgen_default_fused_add_nn_relu                                               
51.61     0.01      2                cpu0                              
5d16c15878cc73d4                                                                
                                float32[1, 128, 7, 7, 16], float32[1, 128, 1, 
1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_global_avg_pool2d                                      
46.06     0.01      1                cpu0                              
f18307e2786f4cb3  NCHW16c                                                       
                                                           float32[1, 128, 7, 
7, 16], float32[1, 128, 1, 1, 16]                                         
tvmgen_default_fused_layout_transform_3                                        
36.66     0.01      1                cpu0                              
2c5d64d5f9faa001                                                                
                                                          float32[1, 1, 28, 28, 
128], float32[1, 16, 28, 28, 8]      NCHW8c                   NCHW128c  
tvmgen_default_fused_layout_transform                                          
11.41     0.00      3                cpu0                              
add43c0d2d8a8a3c                                                                
                                                             float32[1, 32, 7, 
7, 16], float32[1, 16, 7, 7, 32]     NCHW32c                    NCHW16c  
tvmgen_default_fused_nn_softmax                                                 
9.50     0.00      1                cpu0                              
ca61e79ea24e53f0                                                                
                                                                             
float32[1, 1000], float32[1, 1000]                                         
tvmgen_default_fused_layout_transform_nn_batch_flatten                          
1.05     0.00      1                cpu0                              
2db99463d18696a4                                                                
                                                                    float32[1, 
128, 1, 1, 16], float32[1, 2048]        NCHW                    NCHW16c  
----------                                                                      
                                                                                
                                                                                
                                                                                
                                                              
Sum                                                                      
3,82,269.07    99.80     84                                                     
                                                                                
                                                                                
                                                                     
Total                                                                    
3,83,036.30               1                cpu0                                 
                                                          
```

(c) benchmark
```
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', 
(1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest 
context. A fallback configuration is used, which may bring great performance 
regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, 
workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 
2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A 
fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  95.1157      95.0706      95.2259      95.0505       0.0784   
```





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/difference-in-profiler-outputs/11255/7) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/5caef41472d3f106bd825fe0e2066cbc1a0cceb73c11350f652c02bc43447615).

Reply via email to