Hi all,
I am trying to improve quantized performance for memory bound operators (e.g., depthwise or 1x1 convolutions with small shapes). ### Bottom line question Is there any way we can know the strategy picked by the autotuner during the legalization pass of a quantized convolution (qnn.conv2d)? ### Long story In general, for any int8->int32 convolution there are two strategies to follow: * Convert to int16, subtract the offset and the execute conv2d+requantization * Stay in int8 and use some magic instruction to compute int8->int32 convolutions. This introduces the evaluation of 4 terms: Term1 is the core conv2d (int8->int32), and Term{2,4} are the offset contributions (see [here](https://github.com/apache/incubator-tvm/blob/ef6e52f191888ee2a5f2221bde3b69391766903f/src/relay/qnn/op/convolution.cc#L542)) In theory, the int8 approach should outperform the int16, but for memory bound operators the additional Terms{2,4} might hit the performance (I have situations where Term2 takes the same time of nn.conv2d). To have the best of the two worlds, we should implement the two strategies and try them both. At the moment, this is (I think) not possible in TVM. Indeed, the decision of converting to int16 (and then subtracting the offsets) happens during [the legalization pass](https://github.com/apache/incubator-tvm/blob/main/python/tvm/relay/qnn/op/legalizations.py), i.e., when the qnn.conv2d is lowered to a normal nn.conv2d. So, back to the main question: is there a way to know the auto-tuner strategy during the legalization pass? The ideal code would be: ``` @qnn_conv2d_legalize.register("arm_cpu") def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types): If strategy == "conv2d_int16": convert_to_int16(data) else: convert_to_int8(data) ``` --- [Visit Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/aabcab1a9af086df5d5e7b6e90a5bd55676689271c8e4da68af63fd3c90a3843).