Hi,
I imported DeeplabV3+(xception) model named 'xception65_coco_voc_trainval' downloaded from TF model zoo (https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md) It runs well on CPU but gets some error on GPU. ``` target = tvm.target.cuda() ctx = tvm.gpu(0) model_path = '/tensorflow/deeplabv3_pascal_train_aug/frozen_inference_graph.pb' image_url = "/deeplab/test_dataset/image3.jpg" with tf.gfile.GFile(model_path, 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) graph = tf.import_graph_def(graph_def, name='') graph_def = tf_testing.ProcessGraphDefParam(graph_def) image = Image.open(image_url) image_resize = image.convert('RGB').resize((513, 513)) x = np.array(image_resize) x = np.expand_dims(x, 0) mod, params = relay.frontend.from_tensorflow(graph_def, layout='NHWC', shape=x.shape) with relay.build_config(opt_level=3): json, lib, params = relay.build(mod, target=target, params=params) module = runtime.create(json, lib, ctx) input = "ImageTensor" module.set_input(key=input, value=x, **params) module.run() out_0 = module.get_output(0).asnumpy() ``` The error is shown below: ``` [17:36:18] /home/incubator-tvm/src/te/schedule/bound.cc:119: not in feed graph consumer = compute(placeholder_red_temp.repl, 0x11f290c0) WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 256), 'float32'), ('TENSOR', (1, 1, 256, 21), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 256), 'float32'), ('TENSOR', (1, 1, 256, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 304), 'float32'), ('TENSOR', (1, 1, 304, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 1280), 'float32'), ('TENSOR', (1, 1, 1280, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 1, 1, 2048), 'float32'), ('TENSOR', (1, 1, 2048, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 1536), 'float32'), ('TENSOR', (1, 1, 1536, 2048), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 1536), 'float32'), ('TENSOR', (1, 1, 1536, 1536), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 1024), 'float32'), ('TENSOR', (1, 1, 1024, 1536), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 1024), 'float32'), ('TENSOR', (1, 1, 1024, 1024), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 728), 'float32'), ('TENSOR', (1, 1, 728, 1024), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 728), 'float32'), ('TENSOR', (1, 1, 728, 728), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 256), 'float32'), ('TENSOR', (1, 1, 256, 728), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 256), 'float32'), ('TENSOR', (1, 1, 256, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 128), 'float32'), ('TENSOR', (1, 1, 128, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 128), 'float32'), ('TENSOR', (1, 1, 128, 128), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 257, 257, 128), 'float32'), ('TENSOR', (1, 1, 128, 128), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 257, 257, 64), 'float32'), ('TENSOR', (1, 1, 64, 128), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 257, 257, 32), 'float32'), ('TENSOR', (3, 3, 32, 64), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc_winograd_direct.cuda', ('TENSOR', (1, 257, 257, 32), 'float32'), ('TENSOR', (3, 3, 32, 64), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 515, 515, 3), 'float32'), ('TENSOR', (3, 3, 3, 32), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 257, 257, 64), 'float32'), ('TENSOR', (1, 1, 64, 128), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 128), 'float32'), ('TENSOR', (1, 1, 128, 256), 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 65, 65, 2048), 'float32'), ('TENSOR', (1, 1, 2048, 256), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda -model=unknown, workload=('conv2d_nhwc.cuda', ('TENSOR', (1, 129, 129, 256), 'float32'), ('TENSOR', (1, 1, 256, 48), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression. running TVM deeplab on image ... Traceback (most recent call last): File "/home/incubator-tvm/tutorials/test_tensorflow/deeplab/deeplabv3_plus_tvm.py", line 168, in <module> module.run() File "/home/incubator-tvm/python/tvm/contrib/graph_runtime.py", line 177, in run self._run() File "/home/incubator-tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 225, in __call__ raise get_last_ffi_error() tvm._ffi.base.TVMError: Traceback (most recent call last): [bt] (3) /home/incubator-tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f5094810c65] [bt] (2) /home/incubator-tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xb6) [0x7f50948aac56] [bt] (1) /home/incubator-tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x9df) [0x7f50948aaa5f] [bt] (0) /home/incubator-tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x82) [0x7f5093e640b2] File "/home/incubator-tvm/src/runtime/cuda/cuda_module.cc", line 105 File "/home/incubator-tvm/src/runtime/library_module.cc", line 78 CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX Process finished with exit code 1 ``` I tried to tune it by AutoTVM, but for some conv ops, the GFLOPS is always 0.00 ``` [Task 17/26] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (132/1296) | 50.07 s WARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode ``` --- [Visit Topic](https://discuss.tvm.ai/t/cuda-error-invalid-ptx-when-trying-to-run-tensorflow-deeplabv3-model/7121/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/c04e10211676faa03ed2ca3da8f8bad584de67bf2248b2b83781df51240f7f08).