There are some possibilities:
1. Try to use `pick_best` to identify the best config for each workload in a log file. AutoTVM will apply the best config over all tasks for the same workload. In other words, if you tune `direct` and `winograd` for the same conv2d workload and put them in the log file together, only the best one of them will be applied. 2. It's possible that the second build uses the cached one. You could try to add `compile_engine.get().clear()` before calling each `relay.build` to make sure you can get a real performance comparison between two configs. 3. Also please note that you might not get the same performance as shown in the log file, because the latency in the log file was measured using TVM compiled single LLVM function. The graph runtime, on the other hand, has additional overheads. --- [Visit Topic](https://discuss.tvm.ai/t/relay-conv2d-layer-performance-after-auto-tuning-same-as-fallback/6888/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/ebdaca5827de9076da77a3a7415771d46c470490530cd95e8ddab5d57fe82722).