Hi.
First Question
my target is "llvm -mcpu=cascadelake"
In this situation, do tvm runtime(compile?) use avx 512 unit?? (not use
autotvm, scheduler.)
Second Question
1. 2 core(16threads) 2.3ghz
2. 4 core(16threads) 2.8ghz
set TVM_NUM_THREADS=16, and run benchmarks.
2. is little slower th
Hello,
I try to use relay TensorRT integration to accelerate the tensorflow inference,
reference these [relay TensorRT
integration](https://tvm.apache.org/docs/deploy/tensorrt.html), [compile
tensorflow
model](https://tvm.apache.org/docs/tutorials/frontend/from_tensorflow.html#sphx-glr-tuto
If you really want to add an op, I'd just call it matmul. An even better
version is having matmul with all 4 possible transposes, and dense is just one
of them, but this needs many changes in the code base.
cc @tqchen
---
[Visit
Topic](https://discuss.tvm.apache.org/t/best-way-to-deal-wi
This looks more like a hack, :slight_smile:
If I want to do it in the relay, I should add a version of nn.dense (say, name
it nn.dense_transposed_kernel) then register a function convert_dense(...) with
register_convert_op_layouts("nn.dense"), right?
---
[Visit
Topic](https://discuss.tv
There's no change for nn.dense because it doesn't have the version you want, as
you already pointed out.
If you're using BYOC, then there is a trick you can play at this moment. Since
the preprocess still maintains the type, you cannot simply transpose the weight
from `[N, C]` to `[C, N]`. On
Hi, @comaniac . I looked into your example and did a simple experiment similar
to it.
My example network imported into relay as below:
#[version = "0.0.5"]
def @main(%input.1: Tensor[(1, 1, 32, 16), float32], %conv.0.bias:
Tensor[(1), float32], %conv.0.weight: Tensor[(1, 1, 3, 3), fl
The answer would definitely be different in the case of not using BYOC. Without
BYOC, every backend is handled by TVM compilation pipeline. It means every
operator has to have its corresponding TOPI implementation. Since data layout
affects the TE compute semantic, the op with different data l
The TensorRT execution we use in TVM is not asynchronous, so there is no need
to sync. `module.run()` won't return until inference is completed. Actually I
think run() is never asynchronous in TVM?
5ms is not an unreasonable inference time for mobilenet v2 with TensorRT on
xavier, although I
The approach I suggested is the most straightforward one. Relay to TIR is not
one-to-one mapping. A Relay node may be lowered to different TIR functions for
different target and input shapes/dtype.
---
[Visit Topic](https://discuss.tvm.apache.org/t/profile-on-relay-level/9568/4)
to respon
The simplest way is checking if the results from TVM and TensorRT are matched.
For GPU, it's totally possible that TesorRT outperforms TVM if you didn't tune
the model.
Also cc @trevor-m
---
[Visit
Topic](https://discuss.tvm.apache.org/t/tensorrt-seems-ctx-sync-does-not-work-while-using-
10 matches
Mail list logo