Hi, everyone

For cuda target, I first fetch data from global memory to shared memory, then I 
want to achieve software pipeline by prefetching some data from shared memory 
to registers since shared memory request may consume tens of cycles and 
sometimes even longer.

However, the underlying prefetch pass will check the prefetched data in the 
buf_map_ or not (storage_flatten.cc). If I understand correctly, the buf_map_ 
only contains entries which have "global" scope. So, how can I achieve 
prefetching from shared memory?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/prefetch-shared-memory-to-registers/8145/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/3d693ff2f97d0ebd92dc8fe05da2fa50b54fa3b5bc57e9d15176283f31c871ac).

Reply via email to