Thank you for answer.
It is defined in pass, not direct implementation.
Thank you for good information.
---
[Visit
Topic](https://discuss.tvm.ai/t/where-is-the-batch-normalization-implementation-specified-in-tvm/7120/3)
to respond.
You are receiving this because you enabled mailing list
hello!
I would like to see the implementation of batch_normalization used by Relay.
However, I searched all the source code, but I can't get any information about
the implementation.
Is there an implementation in another location?
---
[Visit
Topic](https://discuss.tvm.ai/t/where-is-the-b
Thank you for reply!
---
[Visit Topic](https://discuss.tvm.ai/t/fft-operator-in-relay-ir/7040/5) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/ab11cb15d3b25d7b640ccd25bd748
Can you show me a simple code sample of how it was implemented?
---
[Visit Topic](https://discuss.tvm.ai/t/fft-operator-in-relay-ir/7040/3) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/emai
I've just referenced the
[benchmarks](https://github.com/apache/incubator-tvm/blob/master/apps/benchmark/gpu_imagenet_bench.py),
but the 2080ti doesn't seem to be there.
Maybe the 2080ti also needs to be tuned.
---
[Visit
Topic](https://discuss.tvm.ai/t/quick-start-tutorial-gives-cannot-f
I am not sure, but titan, 2080ti, 1080ti have tuned configuration in TVM.
So if you want to get the optimal performance for the 1070ti, tuning seems to
be the right choice.
also There is a simple tuning template provided in the TVM tutorial, so use it
to tune it.
---
[Visit
Topic](https:
Hello!
I am currently trying to optimize a network running on rk3399 on a mali gpu and
use current version of TVM( 0.7dev1 ).
I followed the simple
[tutorial](https://docs.tvm.ai/tutorials/autotvm/tune_relay_mobile_gpu.html)
provided by the auto tuning tutorial and I am having problems.
TVM
Hello!
I have some questions about arm core ( rk3399 firefly board ).
I want to run Mali GPU and Arm CPU In parallel.
Therefore, after creating two threads, I want to execute a module running on
CPU and GPU in each thread.
and I want to use two big cores for the threads running the CPU, and a
In addition, I experimented, but if I set the shape of the input and the param
as follows, it works normally.
## Setting Shape of Tensor
input_size = (1,3,224,224)
p1_size = (64,3,3,3)
---
[Visit
Topic](https://discuss.tvm.ai/t/error-when-compiling-using-opencl-error-compiler-
Hello!
Currently, I am testing a simple convolution using TOPI.
I experimented with the code below.
import numpy as np
import topi
from tvm import relay
from tvm.relay import testing
import tvm
from tvm.contrib import graph_runtime
## Setting Target and ctx
targ
Hello!
I am trying to use the graph debugger to measure the performance of the VGG16
on the rk3399 board.
I simply debugged it using the code below.
import numpy as np
from tvm import relay
from tvm.relay import testing
import tvm
from tvm import te
from tvm.contrib.d
Hello.
In rk3399, i found a performance decrease during inference using the vgg-16
model.
Performance was measured using the test code below.
import tvm
import tvm.relay as relay
from tvm.contrib import graph_runtime
import numpy as np
import topi
from tvm.relay.testin
hello!
I am currently using rk3399 board to measure performance by running vgg-16 with
old tvm and current tvm. Below is the specification.
rk3399 device 1 -> old version of tvm and ubuntu16.04 + LLVM 8.0.0
rk3399 devcie 2 -> new version of tvm and ubuntu18.04 + LLVM 8.0.0
and I test
Thank you Robert!
This is really useful information. Thank you.
---
[Visit
Topic](https://discuss.tvm.ai/t/arm-cpu-performance-is-too-slow-than-mali-gpu/6220/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https
Hello!
Currently I am trying to inference VGG-16 through arm cpu.
import tvm
import tvm.relay as relay
from tvm.contrib import graph_runtime
import numpy as np
import topi
from tvm.relay.testing.temp_op_attr import TempOpAttr
target_arm_cpu = tvm.target.create('llvm
Hello!
I use rk3399 firefly board with LLVM 8.0.0 and ubuntu 18.04.
I ran vgg16 on the board using the following code.
import tvm
from tvm import te
import tvm.relay as relay
from tvm.contrib import graph_runtime
import numpy as np
import topi
from tvm.relay import t
Thanks for the advice. haichen!
I think i should try optimizing winograd using autotvm.
Thank you.
---
[Visit
Topic](https://discuss.tvm.ai/t/topi-winograd-convolution-performance-is-too-slow/6161/8)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscrib
thank you for the reply haichen! i will try it.
I'm curious that the two convolutions are using fallback configuration not
tunned by autoTVM.
However, winograd is slow, with almost 200 times the performance difference. In
general, does winograd perform poorly before tunning?
---
[Visit
T
Hello!
I am currently implementing Winograd conv2d through Relay.
data_shape = (1,64,224,224)
w_shape= (64,64,3,3)
input_data = relay.var('input', relay.TensorType(data_shape, "float32") )
p1 = relay.var('wgt', relay.TensorType(w_shape, "float32") )
FIR = relay.nn.contrib_
i conducted additional experiments.
When using Conv 1.2 layer of VGG-16 network, according to the
[paper](https://arxiv.org/abs/1509.09308), performance should be better than
direct conv2d.
But the result is that direct conv2d is better.
input_img shape = (1,64,224,224) ## NCHW Format
i try bigger channel img and weight like below.
`img_shape = ( 1,512,224,224 ) , w_shape = (256,512,3,3 )`
shape format is a NCHW.
and the result is
direct => 50.641ms
winograd => 604.84ms
the performance not good than direct conv2d...
Should I use more channels?
---
[Visit
To
Hello!
Currently, I am testing to compare the performance of direct conv2d and
winograd conv2d using TOPI.
However, as a result of experiments, conv2d using winograd algorithm is too
much worse than direct.
The code below is the code I experimented with.
## data shape
data_shape = (1,
Hello!
I am currently making a network using Winograd algorithm and problem is
performance is less than direct conv2d.
According to the paper [Fast Algorithms for Convolutional Neural
Networks](https://arxiv.org/pdf/1509.09308.pdf), performance should be higher
than direct implementation as
23 matches
Mail list logo