[Apache TVM Discuss] [Questions] AutoTVM vs AutoScheduler tuning metrics

Cody H. Yu via Apache TVM Discuss Wed, 23 Jun 2021 10:10:54 -0700


> 1. When tuning using AutoTVM (`tvmc tune` with `--enable-autoscheduler` 
> disabled), one of the metrics printed in the console is Current/Best GFLOPS 
> per task. I do not understand how this metric is being measured or calculated 
> here? In the context of tuning a model, what is this metric describing?


The GFLOP/s is measured by actually running the compiled operator on device. 
With a schedule candidate, we compile the operator, run it on the device to get 
the latency, and calculate the throughput by `FLOPS / latency`. Note that its 
per task, and you may have several tasks in a model, so AutoTVM needs to tune 
every unique task sequentially to achieve good end-to-end performance.

> 2. When using AutoTVM, console data consists of 
> Task/GFLOPS/Progress/Walltime. When using Ansor, data provided includes 
> ID/Latency/Speed/Trials, while also including additional data like GA iter, 
> fail_ct, min/max score, etc. What are the differences and similarities 
> between the data provided by these two services, or are these details covered 
> in the documentation somewhere that I’m missing? Without this info, 
> interpreting tuning runs can be pretty challenging, especially from an 
> entry-level perspective.

They are just using different approaches. Ansor uses random sampling with 
evolutionary search to find the best schedule, so "GA iter" is the process of 
running evolutionary search, "fail_ct" is a counter of invalid schedule being 
explored in this iteration. max/min scores are the max/min schedule quality 
estimated by the performance cost model.

> 3. Finally, this question might stem from my lack of understanding of GFLOPS 
> in the context of tuning a model, but the GFLOPS data that results from using 
> Ansor is significantly lower than that of AutoTVM (when tuning the same model 
> with the same tuning parameters). Does a higher GFLOP value indicate a better 
> or worse tuned schedule?

Higher GFLOP/s does indicate better performance. Did you compare the end-to-end 
model performance after the tuning and find that the model tuned by Ansor is 
worse than AutoTVM? It could be reasons for you to see worse GFLOPS in Ansor 
compared to AutoTVM when looking at a single task. You need to provide more 
information to let people help dive into the root cause. For example, the tasks 
extracted by Ansor and AutoTVM are different, so it is incorrect if you simply 
compare GFLOP/s of the, for example, first task from both frameworks. Also, the 
tuning trial number may also affect the GFLOP/s per task, as Ansor uses task 
scheduler to prioritize important tasks.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/autotvm-vs-autoscheduler-tuning-metrics/10300/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/9514a8d5cfe0080ab96893f0b282d1db7ab7232709f2d77089bbcb7e428d7684).

[Apache TVM Discuss] [Questions] AutoTVM vs AutoScheduler tuning metrics

Reply via email to