What we did in our backend is that when the user creates a target, they can
pass extra arguments to the `tvm.target.hexagon()` function. These arguments
include an optional list of options that will be passed to the LLVM backend.
These are the "internal" LLVM options (i.e. those that you pas
This is a quite valuable topic which can help us figure out what kind of
information related to optimization we can get from TVM IR itself, after all
the LLVM optimization pass applied. For x86 conv2d, my observation is that the
work llvm unrolling is doing can be implemented in TVM schedule b
In my experience plain loop unrolling has always been a blunt hammer and is not
useful in the general case, thus turning that off by default in LLVM makes
sense. Targeted unrolling with vectorization and other loop optimizations is
more beneficial .
I hadn't realized that LLVM turned on plai
Thanks for sharing your thoughts.
Let me share some more background. To achieve high performance for compute
heavy ops (close to hand-written kernels like MKLDNN or ACL), we need to
perform vector register tiling. This is one more level lower than cache tiling.
Here, we have to carefully craf
@anijain2305: Thanks for such great findings! I wonder whether it is really
possible to keep the enable and disable of such optimization at low level more
dynamic than static. Where it should be dependent on target performance rather
than condition of TVM optimization.
In this case too, we ca
This is an interesting observation. I personally didn't play with the schedules
in TVM before and I haven't seen how LLVM's loop transformation would affect
TVM's positively or negatively. But here is my two cents. I think we probably
should not directly disable it. Unrolling is just one of op
It's an interesting issue that LLVM optimization could mess up with TVM's. But
sometimes it also helps improve the performance. We probably need to do more
study on a few other ops and x86 target and see if disabling unrolling could
cause performance regression. Otherwise, I have no objection
I have been working on TVM schedules for ARM. One thing that I notice is that
LLVM has its own unrolling heuristics, that can completely mess up the analysis
that one does for unrolling in TVM.
For example, a developer can choose to unroll a particular axis with the goal
of better reuse utili