I am facing the same problem with prefetching data from shared mem to registers.
Did you solve this?
---
[Visit
Topic](https://discuss.tvm.apache.org/t/how-to-i-use-prefetch-with-gpu-codegen/7294/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe
Hi, everyone
For cuda target, I first fetch data from global memory to shared memory, then I
want to achieve software pipeline by prefetching some data from shared memory
to registers since shared memory request may consume tens of cycles and
sometimes even longer.
However, the underlying pr
Thanks for your reply. I see TVM's C++ API now, I'll try.
---
[Visit Topic](https://discuss.tvm.apache.org/t/profiling-tvm-module/7870/5) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.or
I see, but perf can only give a summary of statistics, which include all the
program and thereby many noise for profling a Module. Seems there are no tools
that we can inspect the cache behavior just for a python code region.
---
[Visit Topic](https://discuss.tvm.apache.org/t/profiling-tvm
Hey, I want to profile the TVM module, e.g: cache-misses, LL-CACHE-MISSES etc.
How can I do this?
---
[Visit Topic](https://discuss.tvm.apache.org/t/profiling-tvm-module/7870/1) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [c