Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

Animesh Jain Mon, 22 Jul 2019 20:21:11 -0700

### QNN Conv2D operator

#### Tensorflow
~~~
tf.nn.quantized_conv2d(
    input,
    filter,
    min_input,
    max_input,
    min_filter,
    max_filter,
    strides,
    padding,
    out_type=tf.dtypes.qint32,
    dilations=[1, 1, 1, 1],
    name=None
)
~~~


#### MxNet
~~~
mxnet.symbol.contrib.quantized_conv(
data=None, 
weight=None,
bias=None,
min_data=None,
max_data=None,
min_weight=None,
max_weight=None, 
min_bias=None, 
max_bias=None, 
kernel=_Null, 
stride=_Null, 
dilate=_Null, 
pad=_Null, 
num_filter=_Null, 
num_group=_Null, 
workspace=_Null, 
no_bias=_Null, 
cudnn_tune=_Null, 
cudnn_off=_Null, 
layout=_Null, 
name=None, 
attr=None, 
out=None, **kwargs)
~~~

#### QNN Proposal 
~~~
def conv2d(data,
           weight,
           input_zero_point,
           kernel_zero_point,
           strides=(1, 1),
           padding=(0, 0),
           dilation=(1, 1),
           groups=1,
           channels=None,
           kernel_size=None,
           data_layout="NCHW",
           kernel_layout="OIHW",
           out_layout="",
           out_dtype="int32"):
    r"""Quantized 2D convolution.

    This operator convolves quantized weight with quantized data. The scale of
    the output quantized tensor is the product of the weight_scale and
    input_scale of the input quantized tensors. The zero point of the output
    quantized tensor is 0. By default, the dtype of output is int32. Please also
    refer to Requantize operator to understand how to scale back the int32
    output to (u)int8.


    Parameters
    ----------
    data : tvm.relay.Expr
        The input data to the operator.

    weight : tvm.relay.Expr
        The weight expressions.

    input_zero_point: int
           The zero point of the data distribution.

    kernel_zero_point: int
           The zero point of the quantized_kernel distribution.

    strides : tuple of int, optional
        The strides of convolution.

    padding : tuple of int, optional
        The padding of convolution on both sides of inputs before convolution.

    dilation : tuple of int, optional
        Specifies the dilation rate to be used for dilated convolution.

    groups : int, optional
        Number of groups for grouped convolution.

    channels : int, optional
        Number of output channels of this convolution.

    kernel_size : tuple of int, optional
        The spatial of the convolution weight.

    data_layout : str, optional
        Layout of the input.

    kernel_layout : str, optional
        Layout of the weight.

    out_layout : str, optional
        Layout of the output, by default, out_layout is the same as data_layout

    out_dtype : str, optional
        Specifies the output data type for mixed precision conv2d.

    Returns
    -------
    result : tvm.relay.Expr
        The computed result.
    """
~~~



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-514037896

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

Reply via email to