I have dug into this further and now I understand why there is no asynchronous 
memory access. TVM was made with GPU in mind (for OpenCL) and GPU change warp 
if the active one stall due to memory access. 

While this is completely justified for CUDA, I think there should be 
asynchronous memory access for OpenCL as it is meant to target generic devices 
and people who wants to use the OpenCL backend for their device will likely run 
into performance issues because of this.

Besides I noticed that TVM doesn't detect accelerator or custom OpenCL device. 
Are you interested in a pull request to fix this?





---
[Visit 
Topic](https://discuss.tvm.ai/t/opencl-async-memory-transfer-and-double-buffering/7706/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/1abf722b994e39be69a6ed47a39b791a9da1c18cf095ddfbdad15db92e9cc45b).

Reply via email to