As I know about tvm, for single op, tvm can compile high level ir to cuda c
source code, and then use the nvcc to compile the source into an executable. To
extract the PTX code, you can retrieve the source code and use appropriate
tools to compile it into PTX.
```python
cuda_mod = tvm.build(s
hi @zpu , some related discussions: [Quantized models are slower than float
models on GPUs - Questions - Apache TVM
Discuss](https://discuss.tvm.apache.org/t/quantized-models-are-slower-than-float-models-on-gpus/15271/3)
---
[Visit
Topic](https://discuss.tvm.apache.org/t/slower-execution-
I'm a noob who just learn tvm for weeks, when i follow the book [**Get Started
with
VTA**](https://tvm.apache.org/docs/vta/tutorials/vta_get_started.html#sphx-glr-download-vta-tutorials-vta-get-started-py)
with my pynq board, i notice that vta use rpc to program my borad with a
sample bit str