Hello~ I am facing the similar problem!

I use autotvm tuned a CNN model trained by tensorflow, all the ops in the model 
were tuned.

After that, I load log file by relay, and test it’s performence, I found the 
whole TVM infer time is bigger than the tensofrflow far away.

The "mod.get_out(0).asnumpy()" time is about 240ms!!!

I observe the following information when I test the TVM tuned model.

    Extract tasks...
    Compile...
    Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 
-model=unknown - 
    thread_warp_size=32, workload=('dense_small_batch.cuda', ('TENSOR', (2500, 
512), 'float32'), 
    ('TENSOR', (6600, 512), 'float32'), None, 'float32'). A fallback 
configuration is used, which may bring 
    great performance regression.
How to fix this non-exists configuration for workload named 
"dense_small_batch.cuda" ?

Looking for your reply! Thank you very much!





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/very-slow-under-linux-cuda/4793/7) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d46b4b37769b0764f18db43b48b4dca60cb3fc42855a4ad6f2d59fbf8e9956f4).

Reply via email to