@tkonolige Thank you for responding.
I just want to find out the amount of time spent on data layout transformations 
while running inference on ResNet-50. profiler_vm seems to report a much lower 
inference cost (1) than debug_executor (2). Does this not contradict your 
statement that profiler_vm may be slower than graph executor? 
Also I ran benchmarking via `tvm.contrib.graph_executor`:
```
with autotvm.apply_graph_best(opt_sch_file):
    with tvm.transform.PassContext(opt_level=3):
                lib = relay.build_module.build(mod, target=target, 
params=params)
                # runtime is tvm.contrib.graph_executor
                module = runtime.GraphModule(lib["default"](dev))
                module.set_input("data", data)
                print("Evaluate inference time cost...")
                print(module.benchmark(dev, func_name="main", number=100, 
repeat=3, end_to_end=True))
```
The inference costs I get via this (3) is always close but lower than (1). Do 
you have any idea why this is so? 

The Outputs:
(1) [profiler_vm]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Name                                                    Duration (us)  Percent  
 layout  Count  out_layout  Device  data_layout  kernel_layout              
Hash                                                                            
                                                                           
Argument Shapes  src_layout  dst_layout  weight_layout  
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                38,648.93    14.19  
             5     NCHW16c    cpu0      NCHW64c     OIHW64i16o  
5c16c122a657ba21                                                         
float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                 31,069.39    11.41  
             4      NCHW8c    cpu0      NCHW16c      OIHW16i8o  
f2c6de1cbe5c0ddb                                                            
float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1, 
8], float32[1, 16, 28, 28, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                23,726.42     8.71  
             3      NCHW8c    cpu0       NCHW2c       OIHW2i8o  
cb108aaf00eff9e2                                                              
float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1, 
8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                18,153.16     6.66  
             5      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  
e4cba4831bd46d2c                                                        
float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1, 
1, 8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 15,697.88     5.76  
             2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
b2d690588ecaac96                                                            
float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1, 
16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_3                         14,098.72     5.18  
             4     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
84bec82add215ebe                                                     float32[1, 
16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], 
float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 10,840.88     3.98  
             3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
d930aa7bf46c34e1                                                          
float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1, 
16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_1                         10,638.57     3.91  
             3     NCHW16c    cpu0       NCHW8c      OIHW8i16o  
6beba43d92784786                                                       
float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 
16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    8,112.57     2.98  
             1      NCHW8c    cpu0       NCHW3c       OIHW3i8o  
2f8575d36cac57f0                                                             
float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8], 
float32[1, 8, 112, 112, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  7,847.28     2.88  
             1      NCHW8c    cpu0      NCHW16c      OIHW16i8o  
7baee5c8a4d8e4ab                                                          
float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1, 
8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  7,684.11     2.82  
             1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
25fd1c3d9d4e561e                                                            
float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1, 
16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add                            7,625.64     2.80  
             2     NCHW32c    cpu0      NCHW16c     OIHW16i32o  
667036afd5deee1b                                                          
float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 
32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                  7,622.32     2.80  
             2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
6e49d3c836077ac7                                                            
float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1, 
16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_2                              7,530.83     2.76  
             1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
b6e66601adaeb1e3                                                                
                 float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16], 
float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_4                          7,305.51     2.68  
             2     NCHW16c    cpu0       NCHW4c      OIHW4i16o  
d0d1536228842867                                                        
float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 
16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_3                              7,303.69     2.68  
             1      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  
493c374dd5e37c2b                                                                
                 float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8], 
float32[1, 256, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14                 7,199.44     2.64  
             2      NCHW8c    cpu0    NCHW2048c    OIHW2048i8o  
af5e7bf563de2757                                                            
float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1, 
8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_1                              7,185.16     2.64  
             1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
5e7a95757d65e24e                                                                
                   float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16], 
float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                3,905.42     1.43  
             1     NCHW32c    cpu0      NCHW16c     OIHW16i32o  
18ea4e7c768c292e                                 float32[1, 4, 56, 56, 16], 
float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 
32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc                                3,776.76     1.39  
             1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  
7ff40af88acd710e                                                                
                       float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32], 
float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       3,693.25     1.36  
             1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  
a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], 
float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 
16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              3,616.06     1.33  
             1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  
faa415ce8e443d42                             float32[1, 16, 28, 28, 8], 
float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 
16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_2                          3,601.05     1.32  
             1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  
c3c48546ccd1c8e4                                                       
float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14, 
16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              3,509.62     1.29  
             1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
237b36f60eadc660                           float32[1, 16, 14, 14, 16], 
float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 
16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 2,119.10     0.78  
             1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
8d07031ff51d0737                                                         
float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 
16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                  1,969.95     0.72  
             1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  
8ec1781e87f7f62e                                                       
float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1, 
16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  1,869.16     0.69  
             1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  
39975a03990f0ed6                                                            
float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1, 
16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                    920.43     0.34  
             1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  
ce29dd2da9289ac4                                                              
float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32], 
float32[1, 2, 56, 56, 32]                                         
fused_add_nn_relu_layout_transform                             814.00     0.30  
             5                cpu0                              
7590737f314ee1d9                                                                
                     float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], 
float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_nn_relu                                              751.40     0.28  
             2                cpu0                              
f6724216088f2bf7                                                                
                         float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], 
float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_dense_pack_add                                658.90     0.24  
             1                cpu0                              
ced18cccebfa2ada                                                                
                           float32[1, 2048], float32[125, 2048, 8], float32[1, 
1000], float32[1, 1000]                                   NC8n  
fused_add_nn_relu_1                                            624.30     0.23  
             3                cpu0                              
848825acfc73218b                                                                
                      float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], 
float32[1, 32, 28, 28, 16]                                         
fused_nn_max_pool2d_add_nn_relu                                378.72     0.14  
 NCHW8c      1                cpu0                              
4883943910905d24                                                                
                          float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], 
float32[1, 8, 56, 56, 8]                                         
fused_layout_transform                                         173.49     0.06  
             5                cpu0                              
0693edb3d97dc77f                                                                
                                                  float32[1, 32, 14, 14, 8], 
float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
fused_add_nn_relu_layout_transform_1                           172.54     0.06  
             2                cpu0                              
468080b095af509a                                                                
                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], 
float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
fused_layout_transform_3                                       138.92     0.05  
             1                cpu0                              
6dda5720a553f260                                                                
                                               float32[1, 64, 14, 14, 16], 
float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_layout_transform                                      90.72     0.03  
             1                cpu0                              
69355d3cc810f874                                                                
                                 float32[1, 3, 224, 224], float32[3, 1, 1], 
float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
fused_nn_global_avg_pool2d                                      83.09     0.03  
NCHW16c      1                cpu0                              
f18307e2786f4cb3                                                                
                                                  float32[1, 128, 7, 7, 16], 
float32[1, 128, 1, 1, 16]                                         
fused_layout_transform_4                                        79.73     0.03  
             1                cpu0                              
aad3e266e27c5054                                                                
                                                   float32[1, 256, 7, 7, 8], 
float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
fused_layout_transform_2                                        50.75     0.02  
             3                cpu0                              
bd0b0c2ae84f7e09                                                                
                                                     float32[1, 64, 7, 7, 8], 
float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
fused_layout_transform_5                                        39.88     0.01  
             2                cpu0                              
69f132fa7e1d6749                                                                
                                                     float32[1, 64, 7, 7, 8], 
float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
fused_layout_transform_1                                        14.62     0.01  
             1                cpu0                              
9bd937910d443787                                                                
                                                    float32[1, 32, 7, 7, 16], 
float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
fused_nn_softmax                                                 7.80     0.00  
             1                cpu0                              
ca61e79ea24e53f0                                                                
                                                                    float32[1, 
1000], float32[1, 1000]                                         
fused_layout_transform_nn_batch_flatten                          1.41     0.00  
             1                cpu0                              
2db99463d18696a4                                                                
                                                           float32[1, 128, 1, 
1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                      
                                                                                
                                                                                
                                                                                
                                               
Sum                                                       2,71,351.60    99.61  
            84                                                                  
                                                                                
                                                                                
                                               
Total                                                     2,72,418.15           
             1                cpu0                                              
                                                                                
                                                                                
                                               
```

(2): [debug_executor]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Name                                                                   Duration 
(us)  Percent   layout  Count  out_layout  Device  data_layout  kernel_layout   
           Hash                                                                 
                                                                                
      Argument Shapes  src_layout  dst_layout  weight_layout  
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14              
5,68,263.92    48.76               1      NCHW8c    cpu0       NCHW3c       
OIHW3i8o  2f8575d36cac57f0                                                      
       float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 
1, 8], float32[1, 8, 112, 112, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2           
1,75,988.78    15.10               1     NCHW32c    cpu0      NCHW16c     
OIHW16i32o  18ea4e7c768c292e                                 float32[1, 4, 56, 
56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 
1, 1, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                
82,241.79     7.06               2     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  b2d690588ecaac96                                                    
        float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 
1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc                               
67,905.70     5.83               1     NCHW32c    cpu0       NCHW8c      
OIHW8i32o  7ff40af88acd710e                                                     
                                  float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 
8, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                 
39,639.91     3.40               5     NCHW16c    cpu0      NCHW64c     
OIHW64i16o  5c16c122a657ba21                                                    
     float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 
1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 
31,242.14     2.68               4      NCHW8c    cpu0      NCHW16c      
OIHW16i8o  f2c6de1cbe5c0ddb                                                     
       float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 
1, 1, 8], float32[1, 16, 28, 28, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_4                         
29,317.11     2.52               2     NCHW32c    cpu0      NCHW16c     
OIHW16i32o  667036afd5deee1b                                                    
      float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 
56, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu                   
23,174.76     1.99               3      NCHW8c    cpu0       NCHW2c       
OIHW2i8o  cb108aaf00eff9e2                                                      
        float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 
1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 
18,815.10     1.61               5      NCHW8c    cpu0    NCHW1024c    
OIHW1024i8o  e4cba4831bd46d2c                                                   
     float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 
1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1                         
14,143.22     1.21               4     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  84bec82add215ebe                                                    
 float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 
14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                 
10,807.49     0.93               3     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  d930aa7bf46c34e1                                                    
      float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 
1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3                         
10,635.87     0.91               3     NCHW16c    cpu0       NCHW8c      
OIHW8i16o  6beba43d92784786                                                     
  float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 
28, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                 
8,887.56     0.76               1     NCHW32c    cpu0       NCHW8c      
OIHW8i32o  ce29dd2da9289ac4                                                     
         float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 
1, 32], float32[1, 2, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                 
8,865.53     0.76               2     NCHW16c    cpu0      NCHW32c     
OIHW32i16o  6e49d3c836077ac7                                                    
        float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 
1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  
8,017.28     0.69               1      NCHW8c    cpu0      NCHW16c      
OIHW16i8o  7baee5c8a4d8e4ab                                                     
     float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 
1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 
7,585.56     0.65               1     NCHW16c    cpu0      NCHW32c     
OIHW32i16o  25fd1c3d9d4e561e                                                    
        float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 
1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2                              
7,442.40     0.64               1     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  b6e66601adaeb1e3                                                    
                             float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 
16, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                  
7,293.48     0.63               2      NCHW8c    cpu0    NCHW2048c    
OIHW2048i8o  af5e7bf563de2757                                                   
         float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 
64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3                              
7,140.03     0.61               1      NCHW8c    cpu0    NCHW1024c    
OIHW1024i8o  493c374dd5e37c2b                                                   
                              float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 
1, 1024, 8], float32[1, 256, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1                              
7,041.60     0.60               1     NCHW16c    cpu0      NCHW32c     
OIHW32i16o  5e7a95757d65e24e                                                    
                               float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 
32, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add                            
6,836.18     0.59               2     NCHW16c    cpu0       NCHW4c      
OIHW4i16o  d0d1536228842867                                                     
   float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 
7, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2                          
3,727.41     0.32               1     NCHW16c    cpu0       NCHW8c      
OIHW8i16o  c3c48546ccd1c8e4                                                     
  float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 
14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              
3,596.31     0.31               1     NCHW16c    cpu0       NCHW8c      
OIHW8i16o  faa415ce8e443d42                             float32[1, 16, 28, 28, 
8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 
1, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       
3,468.59     0.30               1     NCHW16c    cpu0       NCHW4c      
OIHW4i16o  a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 
4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 
1, 1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                
3,440.23     0.30               1     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  237b36f60eadc660                           float32[1, 16, 14, 14, 
16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 
1, 1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  
3,144.19     0.27               1     NCHW16c    cpu0      NCHW32c     
OIHW32i16o  39975a03990f0ed6                                                    
        float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 
1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  
1,997.84     0.17               1     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  8d07031ff51d0737                                                    
     float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 
1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                  
1,783.56     0.15               1     NCHW16c    cpu0      NCHW16c     
OIHW16i16o  8ec1781e87f7f62e                                                    
   float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 
1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_add_nn_relu_1                                            
473.00     0.04               2                cpu0                             
 f6724216088f2bf7                                                               
                          float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], 
float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_add_nn_relu                                              
338.92     0.03               3                cpu0                             
 848825acfc73218b                                                               
                       float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], 
float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_add_nn_relu_layout_transform_1                           
286.62     0.02               5                cpu0                             
 7590737f314ee1d9                                                               
                      float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], 
float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_nn_contrib_dense_pack_add                                
265.74     0.02               1                cpu0                             
 ced18cccebfa2ada                                                               
                            float32[1, 2048], float32[125, 2048, 8], float32[1, 
1000], float32[1, 1000]                                   NC8n  
tvmgen_default_fused_nn_max_pool2d_add_nn_relu                                
251.56     0.02   NCHW8c      1                cpu0                             
 4883943910905d24                                                               
                           float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], 
float32[1, 8, 56, 56, 8]                                         
tvmgen_default_fused_layout_transform_3                                       
132.62     0.01               5                cpu0                             
 0693edb3d97dc77f                                                               
                                                   float32[1, 32, 14, 14, 8], 
float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
tvmgen_default_fused_nn_global_avg_pool2d                                      
69.42     0.01  NCHW16c      1                cpu0                              
f18307e2786f4cb3                                                                
                                                  float32[1, 128, 7, 7, 16], 
float32[1, 128, 1, 1, 16]                                         
tvmgen_default_fused_layout_transform_4                                        
60.92     0.01               1                cpu0                              
aad3e266e27c5054                                                                
                                                   float32[1, 256, 7, 7, 8], 
float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
tvmgen_default_fused_layout_transform_5                                        
58.94     0.01               1                cpu0                              
6dda5720a553f260                                                                
                                               float32[1, 64, 14, 14, 16], 
float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_add_layout_transform                                      
56.01     0.00               1                cpu0                              
69355d3cc810f874                                                                
                                 float32[1, 3, 224, 224], float32[3, 1, 1], 
float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
tvmgen_default_fused_add_nn_relu_layout_transform                              
54.40     0.00               2                cpu0                              
468080b095af509a                                                                
                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], 
float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
tvmgen_default_fused_layout_transform_1                                        
42.67     0.00               2                cpu0                              
69f132fa7e1d6749                                                                
                                                     float32[1, 64, 7, 7, 8], 
float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
tvmgen_default_fused_layout_transform                                          
33.90     0.00               3                cpu0                              
bd0b0c2ae84f7e09                                                                
                                                     float32[1, 64, 7, 7, 8], 
float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
tvmgen_default_fused_layout_transform_2                                        
19.34     0.00               1                cpu0                              
9bd937910d443787                                                                
                                                    float32[1, 32, 7, 7, 16], 
float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
tvmgen_default_fused_nn_softmax                                                 
7.03     0.00               1                cpu0                              
ca61e79ea24e53f0                                                                
                                                                    float32[1, 
1000], float32[1, 1000]                                         
tvmgen_default_fused_layout_transform_nn_batch_flatten                          
0.96     0.00               1                cpu0                              
2db99463d18696a4                                                                
                                                           float32[1, 128, 1, 
1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                      
                                                                                
                                                                                
                                                                                
                                                              
Sum                                                                     
11,64,595.59    99.94              84                                           
                                                                                
                                                                                
                                                                      
Total                                                                   
11,65,326.68                        1                cpu0                       
                                                           
```

(3) [benchmark]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', 
('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 
'float32') is missing in ApplyGraphBest context. A fallback configuration is 
used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better 
performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  269.9458     270.0297     270.0697     269.7381      0.1478   
```





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/difference-in-profiler-outputs/11255/3) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/3be1cc7d891dd0827cc7db9c0c190811ef820a2f81f495b58a315fd38b305cca).

Reply via email to