Kernels running on the GPU require all memory accesses to be within a thread or 
a block. The file you are looking does not do any thread binding. I suggest 
looking at this tutorial: 
https://tvm.apache.org/docs/tutorials/optimize/opt_conv_cuda.html





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/matrix-multiplication-example-for-cuda/8078/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/386e491ff2bb8085cd439ad0adf0c0814bb972800509afe7c5b04521af77db35).

Reply via email to