Hello, I am working with opencl and I am trying to make a new schedule 
optimized for a custom device. As I have a big shared memory, I tried using 
double buffering with the conv2d_direct schedule for CUDA.

When I checked the source code of the generated kernels, I noticed that indeed 
the memory cost is doubled but there is no asynchronous memory transfer to 
leverage double buffering. That lead me to wonder: is double buffering 
completely supported in TVM for opencl?





---
[Visit 
Topic](https://discuss.tvm.ai/t/opencl-async-memory-transfer-and-double-buffering/7706/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/147ec3d52b55eabf2d00937eabcf0d71e813e820f951e6866cc413981ac4ed5f).

Reply via email to