Yes, I remember TVM's implementation of deformable conv is modeled after MXNet.
---
[Visit
Topic](https://discuss.tvm.ai/t/deformable-conv-implementations-differences-between-pytorch-torhcvision-tvm/7702/5)
to respond.
You are receiving this because you enabled mailing list mode.
To unsu
Hi, I'm trying to compile the following trivial function:
```
fn (%p0: Tensor[(3), bool], Primitive=1) -> Tensor[(3), int32] {
where(%p0, 1 /* ty=int32 */, -1 /* ty=int32 */) /* ty=Tensor[(3), int32] */
}
```
Note that the second and third arg to `where` is a scalar. Since this is not
suppor
The input name to `set_input` shouldn't be 0, 1 etc, but the corresponding
variable names like "data", "weight", "offset", "input0" etc. Can you try if
this change gets the correct output?
---
[Visit
Topic](https://discuss.tvm.ai/t/deformable-conv-implementations-differences-between-pyto
No, the input shape needs to be fixed.
---
[Visit
Topic](https://discuss.tvm.ai/t/how-to-setting-model-compiled-from-pytorch-with-mutable-input-size/6827/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://d
conv2d, conv3d and softmax.
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-libs-cudnn-performance/6700/4) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/03c8b3ed49200976017
conv_transpose won't be run on cudnn even if you specify `-libs=cudnn`. Does
this answer your question?
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-libs-cudnn-performance/6700/2) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these email
See DNNL example below. Since TVM runtime is sequential, there is no
synchronization of any kind. You just deal with pointers to tensors via DLTensor
https://github.com/apache/incubator-tvm/tree/master/src/runtime/contrib/dnnl
https://github.com/apache/incubator-tvm/tree/master/src/relay/backen
Yes two ops, even if they are independent, are run sequentially. This is the
code that executes operators:
https://github.com/apache/incubator-tvm/blob/master/src/runtime/graph/graph_runtime.cc#L55-L57
If you have a custom HW and you are interested in inter-op parallelism, you
should be looki
It's simple: We support only intra-operator parallelism, not inter-operator
parallelism. We use threads for paralizing the outer most loop of convolution,
for example.
---
[Visit
Topic](https://discuss.tvm.ai/t/execution-order-of-operators-at-runtime-in-tvm/6572/10)
to respond.
You are
Can you try this?
```
with relay.build_config(opt_level=3, disabled_pass=["AlterOpLayout"]):
...
```
If my memory is right, AlterLayout is what enables winograd weight transform at
compile time. I agree 1.5x increase is pretty bad. Since weight transform is
cheap, I don't think perf hit w
yeah, I also remember being annoyed by this int32 vs int64 issue. I sent some
PRs below but I don't have a good solution.
https://github.com/apache/incubator-tvm/pull/4573
https://github.com/apache/incubator-tvm/pull/4528
Fortunately now that we have PyTorch frontend, and I don't need to deal
[quote="kindlehe, post:26, topic:6256"]
Will tvm consider integrating FBGEMM to get the same heavy lifting in the
future as pytorch has done to support the same high speedup in avx2 device?
[/quote]
No. We should rather improve our avx2 schedule to match FBGEMM performance.
---
[Visit
Top
Yes it is incredible. Quantized Torch uses FBGEMM
https://github.com/pytorch/FBGEMM to do the heavy lifting. They jit generate
asm. I have no idea how their quantized convolution is implemented. You can
take a look at their code.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-sp
Yes, int16 thing is intended. See
https://github.com/apache/incubator-tvm/pull/4307. @anijain2305 can give more
details.
Int8 is only enabled for AVX512.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/23)
to respond.
You are receivi
No, but I think @anijain2305 has done such comparison on rasp4.
---
[Visit
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/16)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
her
hmm I don't know why TVM is faster on mobilenet v3. Maybe because this is a
newer model that Torch team hasn't optimized for. But please make sure you are
setting `TVM_NUM_THREADS` env var correctly (it should be the number of
physical cores)
The numbers seem consistent with what I've seen in
1. I don't have experience using QAT in Torch. I think post training
quantization is easier to work with. In any case, post training quantization
should be the first thing you should try. If you need extra accuracy, QAT may
help.
2. Yes. See
https://docs.tvm.ai/tutorials/frontend/deploy_qua
@adb The PR is up https://github.com/apache/incubator-tvm/pull/5272
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-status-of-annotating-composite-functions/6150/9)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [cl
[quote="anijain2305, post:4, topic:6256"]
do not see tutorial to be very different from FP32 compilation
[/quote]
Yes, for tflite where you can just download pre-quantized model from their zoo,
I don't think it would be different from fp32. For PyTorch it is a bit more
complicated :) All the b
[quote="kindlehe, post:1, topic:6256"]
but it is very hard to find official tutorial about how to do quantization for
pytorch or tf correctiy
[/quote]
Yes, this is a good point. @anijain2305 do we have a plan to send a tutorial
for how to convert from pre-quantized models?
---
[Visit
Top
Yes, without HW support for int8, you shouldn't expect int8 to be any faster
than fp32. For avx2, Torch is much faster than TVM for int8. For avx512, where
int8 does make a difference, TVM is much faster.
I have a script
https://github.com/Edgecortix-Inc/pytorch_quantization/tree/master/tvm_
@adb I had a old PR https://github.com/apache/incubator-tvm/pull/4741 which
demonstrates conv + bias + relu fusion in the "hard" way (before composite was
introduced). I'll send a new one ASAP after PRs by @matt-arm are merged.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-s
In my case, these intermediate structs are strongly tied to our executor. They
are plain structs, so much easier to work with than full blown relay IR. So for
me they are not really overhead.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/25
@trevor-m Thanks for confirming. I can't talk about specific, but let's just
say any cpp serialization lib should be able to serialize/deserialize structs
into a binary blob, and I am just using one of them.
Note that I'm not serializing Relay subgraph as it is, but some structs that
get con
I think TensorRT integration by AWS works in a similar way. If I remember
correctly, they use json instead of binary.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/20)
to respond.
You are receiving this because you enabled mailing list mod
"The executor" part, including API calls to DNNL, is defined in another lib
that is built outside of TVM, and linked to my TVM build. My TVM external
runtime passes binary or deserialized graph rep together with arguements from
TVM to that lib, and this lib knows how to execute the graph. The
hmm, for my use case, I simply serialize a Relay subgraph into some binary
format, pass the binary to runtime module and deserialize there. Then I can
execute this graph with arguments I recieve from TVM in whatever way I like,
including offloading to dnnl. This week I integrated upstream chan
@matt-arm Have you considered using different codegen than CSource? To deal
with large constants, I think binary serialization based codegen is a good fit.
---
[Visit
Topic](https://discuss.tvm.ai/t/external-codegen-constant-tensors-in-c-codegen/5890/14)
to respond.
You are receiving thi
I'm not sure what you are asking. Whatever qconfig you quantize your Torch
model with, the converted Relay model is equivalent to the quantized Torch
model.
But due to the difference in numerics, the raw floating point output between
quantized torch models and converted Relay models can be sl
I don't know or think if we are exposing CUDA stream abstraction to python
frontend. We typically don't care about cuda stream (we don't support any
concurrency at runtime).
What is your use case?
---
[Visit
Topic](https://discuss.tvm.ai/t/how-cuda-kernel-is-launched-in-tvm-stack/6167/7)
Correct. You can tweak the schedule to change the launch config, but as a user
you shouldn't care about the exact size of grid/block.
If you really want the best perf, use autotvm to tune your schedule, and the
resulting grid/block size is optimal based on real measurament.
---
[Visit
T
A related PR with more discussion
https://github.com/apache/incubator-tvm/pull/5202
---
[Visit
Topic](https://discuss.tvm.ai/t/ci-lint-enabling-clang-format-based-lint-checks/6170/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emai
The answer is we use CUDA driver API to launch kernels from C++ code.
```kernel<<>>(a,b,c)``` is not the only way to launch kernel
and it requires compiling with NVCC.
See
https://github.com/apache/incubator-tvm/blob/e0122c0ea68043372220e4e02b81692c34832227/src/runtime/cuda/cuda_module.cc#L1
try bigger number of channels. Winograd is slow for small channels.
---
[Visit
Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
her
Unfortunately we don't have any pip package at the moment. But a runtime only
package sounds reasonable. cc @tqchen
I'd imagine you'd build TVM code outside of Torch first, and export a build
artifact as shared lib. And from Torch you can load the TVM-generated shared
lib in either python cod
Yes, unless you really have integer tensors somewhere, which is highly
unlikely, you can ignore this warning.
This warning is added for when converting from torch models jitted by
`torch.jit.script(...)`. In script, there is no way to tell the type of input
tensors, so we just assume it is f
36 matches
Mail list logo