Hi:
Thanks for you answer. I will check autotvm to see how it tunes grid/block. Because based on experience, grid/block dims will affect performance. And another question is that, I see there is arg for **cuda stream** ``` CUstream strm = static_cast<CUstream>(CUDAThreadEntry::ThreadLocal()->stream); ``` I didn't find any documents about cuda streaming supports in TVM, could you give me a hints about how we could use streaming? Thank you very much! --- [Visit Topic](https://discuss.tvm.ai/t/how-cuda-kernel-is-launched-in-tvm-stack/6167/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/10ab8e059f5689fac891d5c95dbff32b50104fb9b1771a2ea24d42a16f1df506).