Relevant QNN Dialect RFC - #3591
Some QNN operators like Requantize and Conv2D are more amenable to going
through C++ lowering pass. A couple of factors where C++ implementation seems
better is - when new operator is conceptually very different from existing
operators (Requantize), when input/output TensorType needs to be known for
lowering, needs lots of type checking etc.
However, as we start adding more QNN operators, we should try to reduce the
engineering effort to add a new operator. Here, we are proposing a way to
reduce the number of additional lines to add a new operator if the lowering is
quite straightforward. We do add a new QNN operator but it is present only in
Python, we directly return the lowered sequence from Python.
~~~
# QNN maxpool2d can be lowered to cast, subtract and nn max_pool2d.
# This operator is in qnn namespace
def max_pool2d(quantized_data,
input_zero_point,
pool_size=(1, 1),
strides=(1, 1),
padding=(0, 0),
layout="NCHW",
ceil_mode=False):
casted_data = relay.cast(quantized_data, dtype="int32")
shifted_data = relay.subtract(casted_data, relay.const(input_zero_point,
"int32"))
return relay.nn.max_pool2d(shifted_data,
pool_size=pool_size,
strides=strides,
padding=padding,
layout=layout,
ceil_mode=ceil_mode)
~~~
* Therefore we don't add new code in CPP files, making it easier to add a new
*simple* operator. This operator can be shared amongst the framework parsers.
* Many operators are pretty simple - qnn.concat can be converted to a
requantize on each input and then nn.concat, qnn.split can be converted to
nn.split followed by requantize on each output etc. Avg_pool2d and Relu
(basically many unary compute operations) are also very simple and can be
lowered in this manner.
@tqchen @yzhliu @FrozenGene @shoubhik
Thanks @rankyung-hong for prototyping and helping write the doc.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617