Actually Ansor has "local" memory in some special case.

The two level cache read structure has been tried at the beginning when we 
built the Ansor system. And It's still easy for Ansor to add such sketches in 
the current main branch.

Ansor is a tuning based schedule search system, which means the more options, 
the larger the search space, and result on more time to get a good scheduler. 
Ansor needs to try different compute_at locations for the new added cache read 
stage. So this is more like a trade off. If we have an approach to find the 
best schedule in an infinite search space with little time, I'm glad to add 
anything I konw to the sketch policy.

The "shared" memory scope is important for a CUDA schedule, which means to put 
the data to the shared memory. While the "local" memory scopy is not so 
necessary, without it, the lower level compiler will still try to put data in 
the register for better performance.

We would use "local" scope in some schedule like TensorCore, it requires a 
explicit specification to the wmma buffer.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/question-about-cach-read-in-ansor/9940/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/7ef265e74c6a20e1c31d9a54c341b7b4dd55598e78bba4e6d01bf7ddd6c32c7f).

Reply via email to