[TVM Discuss] [Questions] Deformable conv implementations differences between pytorch torhcvision & tvm

2020-09-01 Thread masahi via TVM Discuss
Yes, I remember TVM's implementation of deformable conv is modeled after MXNet. --- [Visit Topic](https://discuss.tvm.ai/t/deformable-conv-implementations-differences-between-pytorch-torhcvision-tvm/7702/5) to respond. You are receiving this because you enabled mailing list mode. To unsu

[TVM Discuss] [Questions] Unclear error from TE Inliner

2020-08-30 Thread masahi via TVM Discuss
Hi, I'm trying to compile the following trivial function: ``` fn (%p0: Tensor[(3), bool], Primitive=1) -> Tensor[(3), int32] { where(%p0, 1 /* ty=int32 */, -1 /* ty=int32 */) /* ty=Tensor[(3), int32] */ } ``` Note that the second and third arg to `where` is a scalar. Since this is not suppor

[TVM Discuss] [Questions] Deformable conv implementations differences between pytorch torhcvision & tvm

2020-08-23 Thread masahi via TVM Discuss
The input name to `set_input` shouldn't be 0, 1 etc, but the corresponding variable names like "data", "weight", "offset", "input0" etc. Can you try if this change gets the correct output? --- [Visit Topic](https://discuss.tvm.ai/t/deformable-conv-implementations-differences-between-pyto

[TVM Discuss] [Questions] How to setting model compiled from pytorch with mutable input size

2020-05-28 Thread masahi via TVM Discuss
No, the input shape needs to be fixed. --- [Visit Topic](https://discuss.tvm.ai/t/how-to-setting-model-compiled-from-pytorch-with-mutable-input-size/6827/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://d

[TVM Discuss] [Questions] CUDA -libs=cudnn performance

2020-05-19 Thread masahi via TVM Discuss
conv2d, conv3d and softmax. --- [Visit Topic](https://discuss.tvm.ai/t/cuda-libs-cudnn-performance/6700/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/03c8b3ed49200976017

[TVM Discuss] [Questions] CUDA -libs=cudnn performance

2020-05-19 Thread masahi via TVM Discuss
conv_transpose won't be run on cudnn even if you specify `-libs=cudnn`. Does this answer your question? --- [Visit Topic](https://discuss.tvm.ai/t/cuda-libs-cudnn-performance/6700/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these email

[TVM Discuss] [Questions] Execution order of operators at Runtime in TVM

2020-05-06 Thread masahi via TVM Discuss
See DNNL example below. Since TVM runtime is sequential, there is no synchronization of any kind. You just deal with pointers to tensors via DLTensor https://github.com/apache/incubator-tvm/tree/master/src/runtime/contrib/dnnl https://github.com/apache/incubator-tvm/tree/master/src/relay/backen

[TVM Discuss] [Questions] Execution order of operators at Runtime in TVM

2020-05-06 Thread masahi via TVM Discuss
Yes two ops, even if they are independent, are run sequentially. This is the code that executes operators: https://github.com/apache/incubator-tvm/blob/master/src/runtime/graph/graph_runtime.cc#L55-L57 If you have a custom HW and you are interested in inter-op parallelism, you should be looki

[TVM Discuss] [Questions] Execution order of operators at Runtime in TVM

2020-05-05 Thread masahi via TVM Discuss
It's simple: We support only intra-operator parallelism, not inter-operator parallelism. We use threads for paralizing the outer most loop of convolution, for example. --- [Visit Topic](https://discuss.tvm.ai/t/execution-order-of-operators-at-runtime-in-tvm/6572/10) to respond. You are

[TVM Discuss] [Questions] Inconsistent params size of optimized models vs non-optimized

2020-04-21 Thread masahi via TVM Discuss
Can you try this? ``` with relay.build_config(opt_level=3, disabled_pass=["AlterOpLayout"]): ... ``` If my memory is right, AlterLayout is what enables winograd weight transform at compile time. I agree 1.5x increase is pretty bad. Since weight transform is cheap, I don't think perf hit w

[TVM Discuss] [Questions] ONNX type mismatch when building with opt level 1

2020-04-14 Thread masahi via TVM Discuss
yeah, I also remember being annoyed by this int32 vs int64 issue. I sent some PRs below but I don't have a good solution. https://github.com/apache/incubator-tvm/pull/4573 https://github.com/apache/incubator-tvm/pull/4528 Fortunately now that we have PyTorch frontend, and I don't need to deal

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread masahi via TVM Discuss
[quote="kindlehe, post:26, topic:6256"] Will tvm consider integrating FBGEMM to get the same heavy lifting in the future as pytorch has done to support the same high speedup in avx2 device? [/quote] No. We should rather improve our avx2 schedule to match FBGEMM performance. --- [Visit Top

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread masahi via TVM Discuss
Yes it is incredible. Quantized Torch uses FBGEMM https://github.com/pytorch/FBGEMM to do the heavy lifting. They jit generate asm. I have no idea how their quantized convolution is implemented. You can take a look at their code. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-sp

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread masahi via TVM Discuss
Yes, int16 thing is intended. See https://github.com/apache/incubator-tvm/pull/4307. @anijain2305 can give more details. Int8 is only enabled for AVX512. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/23) to respond. You are receivi

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread masahi via TVM Discuss
No, but I think @anijain2305 has done such comparison on rasp4. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/16) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click her

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-09 Thread masahi via TVM Discuss
hmm I don't know why TVM is faster on mobilenet v3. Maybe because this is a newer model that Torch team hasn't optimized for. But please make sure you are setting `TVM_NUM_THREADS` env var correctly (it should be the number of physical cores) The numbers seem consistent with what I've seen in

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-08 Thread masahi via TVM Discuss
1. I don't have experience using QAT in Torch. I think post training quantization is easier to work with. In any case, post training quantization should be the first thing you should try. If you need extra accuracy, QAT may help. 2. Yes. See https://docs.tvm.ai/tutorials/frontend/deploy_qua

[TVM Discuss] [Questions] [External CodeGen] Status of Annotating composite functions?

2020-04-07 Thread masahi via TVM Discuss
@adb The PR is up https://github.com/apache/incubator-tvm/pull/5272 --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [cl

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-07 Thread masahi via TVM Discuss
[quote="anijain2305, post:4, topic:6256"] do not see tutorial to be very different from FP32 compilation [/quote] Yes, for tflite where you can just download pre-quantized model from their zoo, I don't think it would be different from fp32. For PyTorch it is a bit more complicated :) All the b

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-07 Thread masahi via TVM Discuss
[quote="kindlehe, post:1, topic:6256"] but it is very hard to find official tutorial about how to do quantization for pytorch or tf correctiy [/quote] Yes, this is a good point. @anijain2305 do we have a plan to send a tutorial for how to convert from pre-quantized models? --- [Visit Top

[TVM Discuss] [Questions] Is there any speed comparison of quantization on cpu

2020-04-07 Thread masahi via TVM Discuss
Yes, without HW support for int8, you shouldn't expect int8 to be any faster than fp32. For avx2, Torch is much faster than TVM for int8. For avx512, where int8 does make a difference, TVM is much faster. I have a script https://github.com/Edgecortix-Inc/pytorch_quantization/tree/master/tvm_

[TVM Discuss] [Questions] [External CodeGen] Status of Annotating composite functions?

2020-04-07 Thread masahi via TVM Discuss
@adb I had a old PR https://github.com/apache/incubator-tvm/pull/4741 which demonstrates conv + bias + relu fusion in the "hard" way (before composite was introduced). I'll send a new one ASAP after PRs by @matt-arm are merged. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-s

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
In my case, these intermediate structs are strongly tied to our executor. They are plain structs, so much easier to work with than full blown relay IR. So for me they are not really overhead. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/25

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
@trevor-m Thanks for confirming. I can't talk about specific, but let's just say any cpp serialization lib should be able to serialize/deserialize structs into a binary blob, and I am just using one of them. Note that I'm not serializing Relay subgraph as it is, but some structs that get con

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
I think TensorRT integration by AWS works in a similar way. If I remember correctly, they use json instead of binary. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/20) to respond. You are receiving this because you enabled mailing list mod

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
"The executor" part, including API calls to DNNL, is defined in another lib that is built outside of TVM, and linked to my TVM build. My TVM external runtime passes binary or deserialized graph rep together with arguements from TVM to that lib, and this lib knows how to execute the graph. The

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
hmm, for my use case, I simply serialize a Relay subgraph into some binary format, pass the binary to runtime module and deserialize there. Then I can execute this graph with arguments I recieve from TVM in whatever way I like, including offloading to dnnl. This week I integrated upstream chan

[TVM Discuss] [Questions] [External Codegen] Constant tensors in c-codegen

2020-04-03 Thread masahi via TVM Discuss
@matt-arm Have you considered using different codegen than CSource? To deal with large constants, I think binary serialization based codegen is a good fit. --- [Visit Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/14) to respond. You are receiving thi

[TVM Discuss] [Questions] [QUANTIZATION][PYTORCH] Suitable pytorch api setting for relay quantization

2020-04-03 Thread masahi via TVM Discuss
I'm not sure what you are asking. Whatever qconfig you quantize your Torch model with, the converted Relay model is equivalent to the quantized Torch model. But due to the difference in numerics, the raw floating point output between quantized torch models and converted Relay models can be sl

[TVM Discuss] [Questions] How CUDA kernel is launched in TVM stack

2020-04-02 Thread masahi via TVM Discuss
I don't know or think if we are exposing CUDA stream abstraction to python frontend. We typically don't care about cuda stream (we don't support any concurrency at runtime). What is your use case? --- [Visit Topic](https://discuss.tvm.ai/t/how-cuda-kernel-is-launched-in-tvm-stack/6167/7)

[TVM Discuss] [Questions] How CUDA kernel is launched in TVM stack

2020-04-01 Thread masahi via TVM Discuss
Correct. You can tweak the schedule to change the launch config, but as a user you shouldn't care about the exact size of grid/block. If you really want the best perf, use autotvm to tune your schedule, and the resulting grid/block size is optimal based on real measurament. --- [Visit T

[TVM Discuss] [Questions] [CI][LINT] Enabling clang-format based lint checks

2020-04-01 Thread masahi via TVM Discuss
A related PR with more discussion https://github.com/apache/incubator-tvm/pull/5202 --- [Visit Topic](https://discuss.tvm.ai/t/ci-lint-enabling-clang-format-based-lint-checks/6170/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emai

[TVM Discuss] [Questions] How CUDA kernel is launched in TVM stack

2020-04-01 Thread masahi via TVM Discuss
The answer is we use CUDA driver API to launch kernels from C++ code. ```kernel<<>>(a,b,c)``` is not the only way to launch kernel and it requires compiling with NVCC. See https://github.com/apache/incubator-tvm/blob/e0122c0ea68043372220e4e02b81692c34832227/src/runtime/cuda/cuda_module.cc#L1

[TVM Discuss] [Questions] [ TOPI ] Winograd convolution performance is too slow

2020-03-31 Thread masahi via TVM Discuss
try bigger number of channels. Winograd is slow for small channels. --- [Visit Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click her

[TVM Discuss] [Application] Deployment to Pytorch/dlpack

2020-03-23 Thread masahi via TVM Discuss
Unfortunately we don't have any pip package at the moment. But a runtime only package sounds reasonable. cc @tqchen I'd imagine you'd build TVM code outside of Torch first, and export a build artifact as shared lib. And from Torch you can load the TVM-generated shared lib in either python cod

[TVM Discuss] [Questions] WARNING:root:Untyped Tensor found, assume it is float

2020-03-21 Thread masahi via TVM Discuss
Yes, unless you really have integer tensors somewhere, which is highly unlikely, you can ignore this warning. This warning is added for when converting from torch models jitted by `torch.jit.script(...)`. In script, there is no way to tell the type of input tensors, so we just assume it is f