This is an interesting observation. I personally didn't play with the schedules
in TVM before and I haven't seen how LLVM's loop transformation would affect
TVM's positively or negatively. But here is my two cents. I think we probably
should not directly disable it. Unrolling is just one of op
It's an interesting issue that LLVM optimization could mess up with TVM's. But
sometimes it also helps improve the performance. We probably need to do more
study on a few other ops and x86 target and see if disabling unrolling could
cause performance regression. Otherwise, I have no objection
I have been working on TVM schedules for ARM. One thing that I notice is that
LLVM has its own unrolling heuristics, that can completely mess up the analysis
that one does for unrolling in TVM.
For example, a developer can choose to unroll a particular axis with the goal
of better reuse utili
Given that it is not easy to get multiple versions of the function compiled
together, I tried to see if `schedule_batch_matmul` can be changed to work for
dynamic input sizes (at the expense of being slightly less efficient). It seem
that [this line and the one
above](https://github.com/apach