Relevant QNN Dialect RFC - #3591 Some QNN operators like Requantize and Conv2D are more amenable to going through C++ lowering pass. A couple of factors where C++ implementation seems better is - when new operator is conceptually very different from existing operators (Requantize), when input/output TensorType needs to be known for lowering, needs lots of type checking etc.
However, as we start adding more QNN operators, we should try to reduce the engineering effort to add a new operator. Here, we are proposing a way to reduce the number of additional lines to add a new operator if the lowering is quite straightforward. We do add a new QNN operator but it is present only in Python, we directly return the lowered sequence from Python. ~~~ # QNN maxpool2d can be lowered to cast, subtract and nn max_pool2d. # This operator is in qnn namespace def max_pool2d(quantized_data, input_zero_point, pool_size=(1, 1), strides=(1, 1), padding=(0, 0), layout="NCHW", ceil_mode=False): casted_data = relay.cast(quantized_data, dtype="int32") shifted_data = relay.subtract(casted_data, relay.const(input_zero_point, "int32")) return relay.nn.max_pool2d(shifted_data, pool_size=pool_size, strides=strides, padding=padding, layout=layout, ceil_mode=ceil_mode) ~~~ * Therefore we don't add new code in CPP files, making it easier to add a new *simple* operator. This operator can be shared amongst the framework parsers. * Many operators are pretty simple - qnn.concat can be converted to a requantize on each input and then nn.concat, qnn.split can be converted to nn.split followed by requantize on each output etc. Avg_pool2d and Relu (basically many unary compute operations) are also very simple and can be lowered in this manner. @tqchen @yzhliu @FrozenGene @shoubhik Thanks @rankyung-hong for prototyping and helping write the doc. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3617