Hello~ I am facing the similar problem!
I use autotvm tuned a CNN model trained by tensorflow, all the ops in the model were tuned. After that, I load log file by relay, and test it’s performence, I found the whole TVM infer time is bigger than the tensofrflow far away. The "mod.get_out(0).asnumpy()" time is about 240ms!!! I observe the following information when I test the TVM tuned model. Extract tasks... Compile... Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -model=unknown - thread_warp_size=32, workload=('dense_small_batch.cuda', ('TENSOR', (2500, 512), 'float32'), ('TENSOR', (6600, 512), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression. How to fix this non-exists configuration for workload named "dense_small_batch.cuda" ? Looking for your reply! Thank you very much! --- [Visit Topic](https://discuss.tvm.apache.org/t/very-slow-under-linux-cuda/4793/7) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d46b4b37769b0764f18db43b48b4dca60cb3fc42855a4ad6f2d59fbf8e9956f4).