Kernels running on the GPU require all memory accesses to be within a thread or a block. The file you are looking does not do any thread binding. I suggest looking at this tutorial: https://tvm.apache.org/docs/tutorials/optimize/opt_conv_cuda.html
--- [Visit Topic](https://discuss.tvm.apache.org/t/matrix-multiplication-example-for-cuda/8078/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/386e491ff2bb8085cd439ad0adf0c0814bb972800509afe7c5b04521af77db35).