Hi:
Is is possible to auto tune a single TOPI operator? (e.g. conv2d_int8)
Do we have tutorial for this?
Thank you very much!
---
[Visit
Topic](https://discuss.tvm.ai/t/how-to-auto-tune-a-single-topi-operator/6541/1)
to respond.
You are receiving this because you enabled mailing list mo
Hi:
I am investigating the capability of TVM primitives (CUDA backend). I take
CUTLASS as a baseline of highly-optimized CUDA library.
I think most of optimization techniques used in CUTLASS like tiling, shared_mem
management are supported by TVM primitives.
Streaming is also an important
Hi:
Thanks for you answer. I will check autotvm to see how it tunes grid/block.
Because based on experience, grid/block dims will affect performance.
And another question is that, I see there is arg for **cuda stream**
```
CUstream strm = static_cast(CUDAThreadEntry::ThreadLocal()->stream);
Hi:
Thank you for your help!
So, based on my understanding for these codes.
in python
```
func(a,b,c)
```
will call this
```
void operator() (TVMArgs args,
TVMRetValue* rv,
void** void_args) const
```
And grid_dim, block_dim are inferred from **TVMArgs args**(
BTW, I am also wondering if TVM stack supports CUDA streaming features like
(https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/)
---
[Visit
Topic](https://discuss.tvm.ai/t/how-cuda-kernel-is-launched-in-tvm-stack/6167/2)
to respond.
You are receiving this because
Hi all:
I am learning the TVM CUDA backend. I have a question about how CUDA kernel is
launched.
Below is my simple test program:
```
import tvm
from tvm import te
import numpy as np
dtype = "float32"
# GEMM size
M=16;K=8;N=16
# declear algorithm
k = te.reduce_axis((0, K), 'k') # loop over d
Hello everyone.
I am trying the first tutorial: Compile ONNX Models.
I got:

Could every give me some tips how to fix this
interrupted by signal 11: SIGSEGV
error.
Thank you very much!!
TVM: 0.6dev
Ubuntu 18.04
---
[Visit