date:20201105

[Apache TVM Discuss] [Development] Quantization and 3D convolution

2020-11-05 Thread Olivier Valery via Apache TVM Discuss



I implemented the conv3d with int8 as following:

I create the file ```python/tvm/topi/cuda/conv3d_int8.py``` which implement the 
operation itself.

```
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
# pylint: disable=invalid-name
# pylint: disable=no-value-for-parameter
"""Int8 conv3d in NCDHWc layout"""
import tvm
from tvm import te
from tvm import autotvm

from .injective import schedule_injective_from_existing
from .tensor_intrin import dp4a
from ..nn.pad import pad
from ..nn.conv3d import unpack_NCDHWc_to_ncdhw
from ..nn.util import get_pad_tuple3d
from ..util import simplify, get_const_tuple, traverse_inline


def conv3d_ncdhw_int8(data, kernel, strides, padding, dilation, 
out_dtype="int32"):
"""Compute conv3d internally using conv3d_ncdhwc layout for int8 dtype"""
assert data.dtype in ("int8", "uint8")
assert kernel.dtype in ("int8", "uint8")
assert data.dtype == kernel.dtype
packed_out = conv3d_NCDHWc_int8(data, kernel, strides, padding, dilation, 
"NCDHW", out_dtype)
return unpack_NCDHWc_to_ncdhw(packed_out, out_dtype)


def schedule_conv3d_ncdhw_int8(outs):
"""Create schedule for tensors"""
return schedule_conv3d_NCDHWc_int8(outs)


@autotvm.register_topi_compute("conv3d_NCDHWc_int8.cuda")
def conv3d_NCDHWc_int8(cfg, data, kernel, stride, padding, dilation, layout, 
out_dtype):
"""Convolution operator in NCDHW[x]c layout for int8."""

# print("conv3d_NCDHWc_int8")

assert layout in ["NCDHW", "NCDHW4c"]

ic_block_factor = 4
oc_block_factor = 4

pre_computed = len(kernel.shape) == 7
if not pre_computed:
batch, channels, depth, height, width = get_const_tuple(data.shape)
assert (
channels % ic_block_factor == 0
), "Number of input channels should be multiple of 
{}".format(ic_block_factor)
packed_data = te.compute(
(batch, channels // ic_block_factor, depth, height, width, 
ic_block_factor),
lambda n, c, d, h, w, vc: data[n, c * ic_block_factor + vc, d, h, 
w],
name="packed_data",
)

out_channels, in_channels, kernel_d, kernel_h, kernel_w = 
get_const_tuple(kernel.shape)
assert out_channels % 4 == 0, "Number of output channels should be 
multiple of {}".format(
oc_block_factor
)
packed_kernel = te.compute(
(
out_channels // oc_block_factor,
in_channels // ic_block_factor,
kernel_d,
kernel_h,
kernel_w,
oc_block_factor,
ic_block_factor,
),
lambda oc_chunk, ic_chunk, kd, kh, kw, oc_block, ic_block: kernel[
oc_chunk * oc_block_factor + oc_block,
ic_chunk * ic_block_factor + ic_block,
kd,
kh,
kw,
],
name="packed_kernel",
)

else:
packed_data = data
packed_kernel = kernel

batch, ic_chunk, in_depth, in_height, in_width, ic_block = 
get_const_tuple(packed_data.shape)
oc_chunk, ic_chunk, kernel_d, kernel_h, kernel_w, oc_block, ic_block = 
get_const_tuple(
packed_kernel.shape
)
assert isinstance(stride, int) or len(stride) == 3
assert isinstance(dilation, int) or len(dilation) == 3

if isinstance(stride, int):
stride_d = stride_h = stride_w = stride
else:
stride_d, stride_h, stride_w = stride

if isinstance(dilation, int):
dilation_d = dilation_h = dilation_w = dilation
else:
dilation_d, dilation_h, dilation_w = dilation

# # compute the output shape

pad_front, pad_top, pad_left, pad_back, pad_down, pad_right = 
get_pad_tuple3d(
padding, (kernel_d, kernel_h, kernel_w)
)
# out_channel = num_filter
out_depth = (in_depth - kernel_d + pad_front + pad_back) // stride_d + 1
out_height = (in_height - kernel_h + pad_top + pad_down) // stride_h + 1
out_width = (in_width - kernel_w + pad_left + pad_right) // stride_w + 1

oshape = (batch, oc_chunk, out_depth, out_height, out_width, oc_block)
# compute graph
pad_before = [0, 0, pad_front, pad_top, pad_left, 0]
pad_after = [0, 0, pad_back, pad_down, pad

[Apache TVM Discuss] [Development/RFC] [RFC] TVM Object Schema DSL

2020-11-05 Thread tqchen via Apache TVM Discuss

First of all, given that the schema generation itself is de-coupled as a
frontend, there won't be a problem for the lower-level production system, as
the object themselves are still presented as part of the C++ and build into the
system. The schema generation is ran separately just like clang-format (and the
language to implement the tool matters less).

One thing that a lot of the discussion rightfully point out is that it is hard
to build a generator that handles method binding in a language agnostic way.
Given the above consideration and the cost mentioned about the complete
generation. The current proposal focused on the problem that we can solve,
namely the data layout generation. Notably, the proposal gives a more inplace
generation process, which means that we can start from the codebase as it is
right now, gradually add object into schema and make use of the feature,
without having to do a disruptive transition. The tool will also serve as a
clang-format style, which means it can be repeatively invoked, and complete the
regions that needs to be completed.

Now back to the overall problems and complexity. There are a few considerations:

- C0: We are adding more language bindings, and would want quick data structure
accessor to these language binding(e.g. rust, or even a first class cython
based member accessor)
- C1: As we introduce more typing into the TIR, we want to enable direct access
of the data structures from the generated code(Allow TIR to access
runtime::Array and any ir nodes), which would require such data layout schema
of these data structures.
- C2: As we start to enhance the python side of the frontend, we eventually
want user to be able to declare their ADT in python, as part of enhanced
TVMScript.

While it is true that keeping the current C++ only binding would not gain a lot
from the schema generation. There are additonal gains in the area of C0. More
importantly, a schema is the blocker to enable C1. Notably, the compiler does
not have to depend on python to make use of C1, as we can still generate layout
info into a backend language and register there. But python could be a quick
starting point.

Of course the above considerations do not force us to use python ast as one of
the frontend to the schema. C2 is certainly one motivation to enable this
route. Notably, there is no intention to support arbitary python, like
TVMscript, we want to define a clear syntax for data layout itself, which is
critical to the above enablement, but also acknowledge that it is simply not
possible to define a language that handles method def/bindings in all langauges
well, thus still allow developers to provide editing directly in the target
language. Notably, in most of the objects of interest(IR objects), we
intentionally do not have method functions. While there is certainly a bit of
complexity being bought in via AST parsing, the goal of a clear pythonic
syntax(defined by ourselves) is managable, and aligned with the first class
python support philosophy.

Of course our other core philosophy is to not get into the ways of the
developers and usecases. If the introduction of the python frontend hampers the
developer's ability to custom define a new object, or port any application on
resource constrained devices and/or languages, then we would need to think more
carefully about it. My understanding is that the current proposal does not
provide constraint in that regard.

Moreover, the explicit design choice of inplace generation(e.g. the
clang-format approach) instead of the full generation greatly reduces the path
for adoption and transition. The codebase can stand without the schema tool and
continue to add objects manually. The annotated region get generated (and
checked via linting pass) as we gradually add objects that requires schema
generation. The code are checked in as part of the codebase alleviating the
concern of complexity of a full generator system. While I understand that there
might be desire to push for a full-fledged generator, we do feel that the
strong need for customization, and gradual adoption would make this path a
better one with less resistance.

---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-tvm-object-schema-dsl/7930/14) to
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/43508a5383efea3e5eecb7720ab0e626a9da5a875fbf9aa50909b31533e1ef8f).

[Apache TVM Discuss] [Development] Quantization and 3D convolution

2020-11-05 Thread Tristan Konolige via Apache TVM Discuss



Hello @OValery16, I believe the issue you are encountering is that you are 
calling `te.thread_axis("threadIdx.z")` multiple times. Instead, can you try 
creating the thread axis once with `thread_z = te.thread_axis("threadIdx.y")` 
and then use it like so: `s[output].bind(s[output].fuse(tf, td), thread_z)`. I 
think you'll also have to do this for `threadIdx.x` and `threadIdx.y`.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantization-and-3d-convolution/8338/3) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/04bec62e3dd05fbf587182d52cf707f4365f9c20dfe5caf3a796c4e5f3514006).

[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

2020-11-05 Thread thkim via Apache TVM Discuss



May I ask what version of the code (incubator-tvm) you used?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/vta-workaround-for-autotuning-with-one-pynq-z1-board/8091/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/816b39822b01e2b7fb8fe09b62961ed40a20e97c178191d4f2b669ded4ad21c8).

[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

2020-11-05 Thread Hanting Huang via Apache TVM Discuss



0.7 release 2020-10-02





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/vta-workaround-for-autotuning-with-one-pynq-z1-board/8091/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/15b960f55c509c1dacab267fb55a8c6f1ed19ded83986727512a5c85bc5881cb).

[Apache TVM Discuss] [Development] Quantization and 3D convolution

[Apache TVM Discuss] [Development/RFC] [RFC] TVM Object Schema DSL

[Apache TVM Discuss] [Development] Quantization and 3D convolution

[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

[Apache TVM Discuss] [Development] [VTA] Workaround for Autotuning with One PYNQ Z1 Board

5 matches

Site Navigation

Mail list logo

Footer information