TVM has a warp memory abstraction. If you use `allocate((128,), 'int32', 
'warp')`, TVM will put the data in thread local registers and then use shuffle 
operations to make the data available to other threads in the warp. Out can 
also use the shuffles directly if you want. I'm not sure how exactly to use 
warp shuffles in hybrid script, but you can grep the codebase for 
`tvm_warp_shuffle`.





---
[Visit Topic](https://discuss.tvm.apache.org/t/tvm-cuda-warp-level-sync/8043/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/a71411164d5bfecc86c87886e94fbc4366b03757f2067e6f4801cb3ddb4438c4).

Reply via email to