[TVM Discuss] [Questions] [ TOPI ] Winograd convolution performance is too slow

2020-03-31 Thread ckh via TVM Discuss
i try bigger channel img and weight like below. `img_shape = ( 1,512,224,224 ) , w_shape = (256,512,3,3 )` shape format is a NCHW. and the result is direct => 50.641ms winograd => 604.84ms the performance not good than direct conv2d... Should I use more channels? --- [Visit To

[TVM Discuss] [Questions] [ TOPI ] Winograd convolution performance is too slow

2020-03-31 Thread masahi via TVM Discuss
try bigger number of channels. Winograd is slow for small channels. --- [Visit Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click her

[TVM Discuss] [Questions] [ TOPI ] Winograd convolution performance is too slow

2020-03-31 Thread ckh via TVM Discuss
Hello! Currently, I am testing to compare the performance of direct conv2d and winograd conv2d using TOPI. However, as a result of experiments, conv2d using winograd algorithm is too much worse than direct. The code below is the code I experimented with. ## data shape data_shape = (1,

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread jonso via TVM Discuss
Awesome, thanks a lot @trevor-m. One more quick question before I try it out - what data type is DLTensor->data? The `codegen_c` base casts it to the type that the argument to the function is (in my case, input is a `float*` and input_mask is an `int*`). --- [Visit Topic](https://discuss

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread Trevor Morris via TVM Discuss
Hi @jonso, when I do relay.build with target="cuda", the data inputs supplied to my runtime module are already placed on the GPU by the graph runtime.The DLTensor->data will be a device pointer to the data in GPU memory and you can pass this directly to CUDA libraries. If you need to get the

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread Zhi via TVM Discuss
@jonso if you can get into the `GetFunction` in external module, it means there is no problem for runtime symbol lookup. Can you check if the input data is correct? For example, the data you have in the external runtime should be from here: https://github.com/apache/incubator-tvm/blob/master/

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread Cody H. Yu via TVM Discuss
Ah I see. One reason might be an empty host module in this case. I'd call out @trevor-m since he has the experience to offload subgraphs to TRT while keeping thre rest on CUDA. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-with-cuda-target/6159/4) to respond. You are recei

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread jonso via TVM Discuss
Sorry about that, I think I misspoke. I already have the annotation pass set up properly and my codegen is being called. However, when I try to print out one of my inputs from my codegen, the program crashes. I have a feeling that since the target is “cuda”, the data isn’t being moved from G

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread Cody H. Yu via TVM Discuss
No that's a different flow. TVM itself has the cuBLAS and cuDNN support already ([example](https://github.com/apache/incubator-tvm/blob/master/python/tvm/contrib/cudnn.py)). If you set the target with `-libs`, it's using the TVM builtin one instead of your codegen. To use your codegen, now you

[TVM Discuss] [Questions] [External CodeGen] Status of Annotating composite functions?

2020-03-31 Thread adb via TVM Discuss
Thank you @comaniac and @matt-arm. I will see if MergeCompilerRegions is something that will work for us, otherwise will keep an eye out for your PR. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/4) to respond. You are receiving

[TVM Discuss] [Questions] External codegen with CUDA target

2020-03-31 Thread jonso via TVM Discuss
Hey @zhiics and @comaniac, I am working on an external codegen that will run on GPU. My external codegen module is a CSourceModule. The code generated in this module will call some CUDA APIs. If I go through the external codegen workflow and set the target to `cuda -libs=cublas,cudnn`, will

[TVM Discuss] [Questions] [External CodeGen] Status of Annotating composite functions?

2020-03-31 Thread Matt Barrett via TVM Discuss
AnnotateTarget doesn't support composite functions at the moment. I intend to send a PR to resolve this very soon (hopefully this week). You can use MergeCompilerRegions if you like, but this will only be applicable if you also support conv2d, bias and relu individually as well as merged.

[TVM Discuss] [Questions] Can convolution using Winograd algorithm be slower than Direct convolution?

2020-03-31 Thread ckh via TVM Discuss
Hello! I am currently making a network using Winograd algorithm and problem is performance is less than direct conv2d. According to the paper [Fast Algorithms for Convolutional Neural Networks](https://arxiv.org/pdf/1509.09308.pdf), performance should be higher than direct implementation as

[TVM Discuss] [Questions] Extract Constant Size Feature

2020-03-31 Thread Jaehun Ryu via TVM Discuss
In xgb cost model, it extracts feature some kind of methods(itervar, curve, knobs) But, its size depends on kind of knobs and # of knobs. So any other method, I extract the constant size of feature from any schs and tasks --- [Visit Topic](https://discuss.tvm.ai/t/extract-constant-size-fe