Hi @tkonolige
Sorry for the delay in this response.
I modified the target to "llvm -mcpu=cascadelake" according to the target and
re-did the tuning. Now I get a much better inference time of < 100ms on
benchmark and VirtualMachineProfiler, but a 4x discrepancy still remains
between the output
[2] Without graph tuning
(a) profiler_vm
```
One or more operators have not been tuned. Please tune your model for better
performance. Use DEBUG logging level to see more details.
NameDuration (us) Percent
I cannot reproduce the results you are getting. For me, the graph runtime and
the VM are within 10% of each other in profiling. And they are pretty close to
the benchmark results too.
Here are some questions that might help you debug this:
- Have you tried running on a different machine?
- Ha
I have run both profilers multiple times. The vm_profiler's inference times are
consistently 270-272 ms, while that of the debug_executor's is within 800ms -
1.2s.
Here is the whole code just in case:
```
import numpy as np
import pytest
from io import StringIO
import csv
import os
import json
I'm surprised you are seeing such a large difference. Can you try running the
profiler multiple times (in the same script) and see if the results are
consistent.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/difference-in-profiler-outputs/11255/4)
to respond.
You are receiving this
@tkonolige Thank you for responding.
I just want to find out the amount of time spent on data layout transformations
while running inference on ResNet-50. profiler_vm seems to report a much lower
inference cost (1) than debug_executor (2). Does this not contradict your
statement that profiler_