> 1. When tuning using AutoTVM (`tvmc tune` with `--enable-autoscheduler` > disabled), one of the metrics printed in the console is Current/Best GFLOPS > per task. I do not understand how this metric is being measured or calculated > here? In the context of tuning a model, what is this metric describing?
The GFLOP/s is measured by actually running the compiled operator on device. With a schedule candidate, we compile the operator, run it on the device to get the latency, and calculate the throughput by `FLOPS / latency`. Note that its per task, and you may have several tasks in a model, so AutoTVM needs to tune every unique task sequentially to achieve good end-to-end performance. > 2. When using AutoTVM, console data consists of > Task/GFLOPS/Progress/Walltime. When using Ansor, data provided includes > ID/Latency/Speed/Trials, while also including additional data like GA iter, > fail_ct, min/max score, etc. What are the differences and similarities > between the data provided by these two services, or are these details covered > in the documentation somewhere that I’m missing? Without this info, > interpreting tuning runs can be pretty challenging, especially from an > entry-level perspective. They are just using different approaches. Ansor uses random sampling with evolutionary search to find the best schedule, so "GA iter" is the process of running evolutionary search, "fail_ct" is a counter of invalid schedule being explored in this iteration. max/min scores are the max/min schedule quality estimated by the performance cost model. > 3. Finally, this question might stem from my lack of understanding of GFLOPS > in the context of tuning a model, but the GFLOPS data that results from using > Ansor is significantly lower than that of AutoTVM (when tuning the same model > with the same tuning parameters). Does a higher GFLOP value indicate a better > or worse tuned schedule? Higher GFLOP/s does indicate better performance. Did you compare the end-to-end model performance after the tuning and find that the model tuned by Ansor is worse than AutoTVM? It could be reasons for you to see worse GFLOPS in Ansor compared to AutoTVM when looking at a single task. You need to provide more information to let people help dive into the root cause. For example, the tasks extracted by Ansor and AutoTVM are different, so it is incorrect if you simply compare GFLOP/s of the, for example, first task from both frameworks. Also, the tuning trial number may also affect the GFLOP/s per task, as Ansor uses task scheduler to prioritize important tasks. --- [Visit Topic](https://discuss.tvm.apache.org/t/autotvm-vs-autoscheduler-tuning-metrics/10300/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/9514a8d5cfe0080ab96893f0b282d1db7ab7232709f2d77089bbcb7e428d7684).