All of above `qnn` ops will be lowered to existing Relay primitive ops using some Relay pass (for example, using ForwardRewrite infra). For example - `relay.op.qnn.conv2d` can be lowered to ~~~ fn (%quantized_data: Tensor[(2, 1, 2, 4), uint8], %weight: Tensor[(3, 1, 2, 2), uint8]) -> Tensor[(2, 3, 1, 3), uint8] { %0 = nn.conv2d(%quantized_data, %weight, kernel_size=[2, 2], out_dtype="int32") %1 = cast(%0, dtype="float32") %2 = multiply(%1, 0.25098f) %3 = round(%2) %4 = cast(%3, dtype="int32") %5 = clip(%4, a_min=0, a_max=255) cast(%5, dtype="uint8") } ~~~ ---------------------
I have yet to understand what needs to be done with softmax. Will have to look at a quantized model to understand. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-507461088