[Apache TVM Discuss] [Questions] Issues with Autotuning a Convolutional Network for VTA Simulator

2020-11-01 Thread keai007 via Apache TVM Discuss


Same issue, anyone can help?😂





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/issues-with-autotuning-a-convolutional-network-for-vta-simulator/5821/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/007000cff798fea6bf88a4c73b87323539ce40fb4f74c832d650d43826bd0bac).


[Apache TVM Discuss] [Questions] Failure [INSTALL_FAILED_OLDER_SDK]

2020-11-01 Thread venkataraju koppada via Apache TVM Discuss


Hi @tkat0,

Do you have any suggestions on this issue?

Thanks and Regards,
Raju





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/failure-install-failed-older-sdk/7792/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8945f9a616cdf62a19b9097af723c9b22f258b53f8cb1cd67ddcfa686ad786f6).


[Apache TVM Discuss] [Questions] I can't find directory'device' in tvm/micro

2020-11-01 Thread Qelk123 via Apache TVM Discuss


Hi Andrew,

Thank you for your reply and I have successfully worked out that problem with a 
repository of TVM posted in Mar. However when I test the *.bin after I flashed 
the document to NUCLEO-F746ZG,something goes wrong like this:
![QQ图片20201101232935|690x318](upload://pIQQlVW5IkUkLtitlAaN3QdRndh.png) 
![QQ图片20201101232940|690x331](upload://j98CNnzFO9bXz3Cz9dDIm1KCURO.png) 
There are sevral errors,but I think the error which makes the program abort 
probabily is: monitor thread exiting while process still alive; killing process
Is there something wrong with my settings?How should I solve this problem?I 
would very appreciate it if you could give me some suggestions.
Mike





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/i-cant-find-directorydevice-in-tvm-micro/8272/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/6602062140ad96759f3e1c1257f085f818b58f91f07329d9531b039f4c3e0066).


[Apache TVM Discuss] [Questions] Tutorial, "How to optimize matmul with Auto TensorCore CodeGen", cannot work on my machien

2020-11-01 Thread Steven via Apache TVM Discuss


My machine has a V100 with cuda 10.2. The tutorial cannot generate code using 
tensor core primitives but ordinary cuda code. Can anyone help to solve this 
situation? Thanks!
Here is the code I try to run:

import logging
import sys

import numpy as np
import tvm
from tvm import te
import tvm.testing

from tvm import autotvm
from tvm.contrib import nvcc


def matmul_nn(A, B, L, dtype="float16", layout="NN"):
k = te.reduce_axis((0, L), name="k")
if dtype == "float16":
out_type = "float"
elif dtype == "int8":
out_type = "int"
elif dtype == "int4" or dtype == "int1":
out_type = "int"
if layout == "NN":
return te.compute(
(N, M), lambda i, j: te.sum(A[i, k].astype(out_type) * B[k, 
j].astype(out_type), axis=k)
)
if layout == "NT":
return te.compute(
(N, M), lambda i, j: te.sum(A[k, i].astype(out_type) * B[k, 
j].astype(out_type), axis=k)
)
if layout == "TN":
return te.compute(
(N, M), lambda i, j: te.sum(A[i, k].astype(out_type) * B[j, 
k].astype(out_type), axis=k)
)
if layout == "TT":
return te.compute(
(N, M), lambda i, j: te.sum(A[k, i].astype(out_type) * B[j, 
k].astype(out_type), axis=k)
)

def test_gemm(N, L, M, dtype, layout):
if layout == "NN":
shape_a = (N, L)
shape_b = (L, M)
elif layout == "NT":
shape_a = (L, N)
shape_b = (L, M)
elif layout == "TN":
shape_a = (N, L)
shape_b = (M, L)
elif layout == "TT":
shape_a = (L, N)
shape_b = (M, L)
else:
print("Unsupported layout:", layout)
sys.exit(1)
A = te.placeholder(shape_a, name="A", dtype=dtype)
B = te.placeholder(shape_b, name="B", dtype=dtype)
C = matmul_nn(A, B, L, dtype, layout)

s = te.create_schedule(C.op)
y, x = s[C].op.axis
k = s[C].op.reduce_axis[0]

# storage_align params
factor = 16
offset = 8
if dtype == "int8":
factor = 32
offset = 16
elif dtype == "int4":
factor = 64
offset = 32
elif dtype == "int1":
factor = 256
offset = 128

# create cache stages
AA = s.cache_read(A, "shared", [C])
if layout == "NN" or layout == "TN":
s[AA].storage_align(AA.op.axis[0], factor, offset)
AL = s.cache_read(AA, "local", [C])
BB = s.cache_read(B, "shared", [C])
if layout == "TT" or layout == "NT":
s[BB].storage_align(BB.op.axis[0], factor, offset)
BL = s.cache_read(BB, "local", [C])
CL = s.cache_write(C, "local")

bx = 4
by = 32
step_k = 16
v = 8

# thread tile
TX = 8
TY = 1
if dtype == "int4" or dtype == "int1":
TX = 2
# warp tile
warp_tile_m = 16  # it could also be 8 or 32 on CUDA version >= 10.0
warp_tile_k = 16  # it must be 16 for fp16/int8 data type
if dtype == "int4":
warp_tile_m = 8
warp_tile_k = 32
elif dtype == "int1":
warp_tile_m = 8
warp_tile_k = 128
# block tile
tile_x = bx * TX
tile_y = by * TY

yo, ty = s[C].split(y, tile_y)
ty, yi = s[C].split(ty, TY)

# schedule for C stage
xo, xi = s[C].split(x, tile_x)
WX = min(warp_tile_m, tile_x)
tz, xi = s[C].split(xi, WX)
tx, xi = s[C].split(xi, TX)
s[C].reorder(yo, xo, tz, ty, tx, yi, xi)
s[C].bind(yo, te.thread_axis("blockIdx.y"))
s[C].bind(xo, te.thread_axis("blockIdx.x"))
s[C].bind(ty, te.thread_axis("threadIdx.y"))
s[C].bind(tz, te.thread_axis("threadIdx.z"))
s[C].bind(tx, te.thread_axis("threadIdx.x"))

# schedule for CL stage
ko, ki = s[CL].split(k, step_k * warp_tile_k)
kl, ki = s[CL].split(ki, warp_tile_k)
s[CL].compute_at(s[C], tx)
yo, xo = CL.op.axis
s[CL].reorder(ko, kl, ki, yo, xo)

# schedule for AA stage
s[AA].compute_at(s[CL], ko)
xo, xi = s[AA].split(s[AA].op.axis[1], factor=bx * v)
tz, tx = s[AA].split(xi, factor=(WX // TX) * v)
tx, vec = s[AA].split(tx, factor=v)
fused = s[AA].fuse(s[AA].op.axis[0], xo)
_, ty = s[AA].split(fused, factor=by)
s[AA].bind(ty, te.thread_axis("threadIdx.y"))
s[AA].bind(tz, te.thread_axis("threadIdx.z"))
s[AA].bind(tx, te.thread_axis("threadIdx.x"))
# vectorization is very important for float16/int8 inputs
s[AA].vectorize(vec)

# schedule for BB stage
s[BB].compute_at(s[CL], ko)
xo, xi = s[BB].split(s[B

[Apache TVM Discuss] [Questions] TVM int8 graph Optimization relu problem

2020-11-01 Thread dingyongchao via Apache TVM Discuss


In TVM, these has lots of operator which can not support int8 calculate such as 
mean and avg_pool2d. And take mean Op for example, the TVM will automatilly 
insert dequant and quant Op. And In graph optimization process, I will get the 
right result as show in pictures. However, In my mind, I think the left one 
will be the best graph optimization, because it has low memory access. So I 
wonder how to modify the graph optimization methods to generate the result as 
left.

![image|242x500](upload://jGVPhgpRmrhDgUx4Q39IXjldJw6.jpeg)





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/tvm-int8-graph-optimization-relu-problem/8347/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/77a9c3fef7d4fce6f866e1e3074148e6de3ecb96422adbab918a8f6872f8fb49).


[Apache TVM Discuss] [Questions] Issues with Autotuning a Convolutional Network for VTA Simulator

2020-11-01 Thread YuxuanGuo via Apache TVM Discuss


The same issue, I'm trying to tune some conv2d_packed workloads on vta fast 
simulator and test the speed of these ops. What should I do? How to set the 
`target` of the autotvm task and the `runner` of  `measure_option`?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/issues-with-autotuning-a-convolutional-network-for-vta-simulator/5821/6)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/12fd51fb7d055a94a82929d6df99dba472c72538a04872f1d90af670ae182e61).