I also found that it took a lot of time to wait for each operator results, so I put multiple operators in a single command buffer and apply the lazy execution. But I'm not familiar with TVM, and the code in the PR seems to be very different from ncnn, so I cannot judge if this PR is the best, but it must be much better than the current TVM approach I think.
If you have any questions about the ncnn vulkan backend, feel free to post an issue to https://github.com/Tencent/ncnn/issues -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3849#issuecomment-527072640