All of above `qnn` ops will be lowered to existing Relay primitive ops using 
some Relay pass (for example, using ForwardRewrite infra). For example -  
`relay.op.qnn.conv2d` can be lowered to 
~~~
fn (%quantized_data: Tensor[(2, 1, 2, 4), uint8], %weight: Tensor[(3, 1, 2, 2), 
uint8]) -> Tensor[(2, 3, 1, 3), uint8] {
  %0 = nn.conv2d(%quantized_data, %weight, kernel_size=[2, 2], 
out_dtype="int32")
  %1 = cast(%0, dtype="float32")
  %2 = multiply(%1, 0.25098f)
  %3 = round(%2)
  %4 = cast(%3, dtype="int32")
  %5 = clip(%4, a_min=0, a_max=255)
  cast(%5, dtype="uint8")
}
~~~
---------------------

I have yet to understand what needs to be done with softmax. Will have to look 
at a quantized model to understand.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-507461088

Reply via email to