Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)
@FrozenGene a clarifying question to your above comment. If we pass in the output scale and shift can we not compute int32-> int8 by simply adding more nodes in the graph. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-502376311
Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)
> Thanks. Let's lay down the high-level API design for some of the quantized > operators. A large portion of this is coming from the following relevant > discussions. Thanks to @jackwish, @FrozenGene and @jnorwood for sharing their > experiences with quantization, and also @shoubhik for helping design this RFC. > > * > [Discussion](https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651) > > Other non-TVM related links that were used to understand quantization > > * GemmLowP - > [Doc](https://github.com/google/gemmlowp/blob/master/doc/quantization.md) > * TFlite reference > [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/reference/conv.h#L101-L182) > > **Covered frameworks for now** - TFLite and MxNet > **Target network for now** - Inception V3 from TFLite. (I will create one for > Mxnet) > **Target platforms for now** - ARM and Intel (will create separate Issue as > the project progresses) > > **List of required operators** - quantize, quantized_conv2d, qunatized_relu, > quantized_pool2d, quantized_fully_connected, quantized_concat, dequantize > > It will be good if we can agree on Relay ops - its inputs/outputs and the > attributes. The initial proposal for the quantize, quantized_conv2d and > dequantize ops is as follows (other quantized_* operators will be on the same > lines as that of quantized_conv2d) > > ## Op quantize > ```python > def quantize(data, scale, zero_point, out_dtype): > """ > Quantize takes the scale and zero_point attributes and quantizes the > FP32 input data to int8/uint8 tensor. > > Parameters > --- > data: FP32 tensor >The input tensor in FP32. > > scale: FP32 scalar (An attribute of the op) >The float scalar to scale the int8 values back to FP32. > > zero_point: Int32 zero point (An attribute of the op) >The zero point of the distribution. > > out_dtype: String >The dtype of the output. Can only be int8/uint8 > > Returns > --- > quantized_data: int8/uint8 tensor >The quantized tensor. > > """ > ``` > > Key points to discuss > > * The scale and zero_point calculations happen outside the relay graph, i.e., > the framework parsers will have to compute the scale and offset if only min > and max are provided. [Reference > implementation](https://github.com/tensorflow/tensorflow/blob/22e458382d3001a0cda4e594decf175f2387475e/tensorflow/lite/kernels/internal/quantization_util.h#L28-L99) > in TFLite. This can also be thought as a framework parser utils where we can > handle min/max, symmetric/asymmetric etc and generate the scale and > zero_point as frameworks handles them. > > ## Op quantized_conv2d > ```python > def quantized_conv2d(quantized_data, quantized_kernel, > input_scale, input_zero_point, > kernel_scale, kernel_zero_point, > output_scale, output_zero_point, > out_dtype, > > # All the old remaining ones from conv2d > strides=(1, 1), > padding=(0, 0), > dilation=(1, 1), > groups=1, > channels=None, > kernel_size=None, > data_layout="NCHW", > kernel_layout="OIHW", > out_layout=""): > """ > > Quantize takes the scale and zero_point attributes and quantizes the > FP32 input data to int8/uint8 tensor. The scale and zero_point > calculations > happen outside the relay graph, i.e., the framework parsers will have to > compute > the scale and offset if only min and max are provided. > > Parameters > --- > quantized_data: int8/uint8 tensor >The quantized input tensor in int8/uint8. > > quantized_kernel: FP32 tensor >The quantized kernel tensor in int8/uint8. > > input_scale: FP32 scalar (An attribute of the op) >The float scalar to scale the quantized_data int8 values back to > FP32. > > input_zero_point: Int32 zero point (An attribute of the op) >The zero point of the quantized_data distribution. > > kernel_scale: FP32 scalar (An attribute of the op) >The float scalar to scale the quantized_kernel int8 values back to > FP32. > > kernel_zero_point: Int32 zero point (An attribute of the op) >The zero point of the quantized_kernel distribution. > > output_scale: FP32 scalar (An attribute of the op) >The output scale is set during the quantization process us
Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)
> > We need to add `in_dtype` in the dequantize op as the calculations will be > > different, especially the range to use. > > Guess the input tensor has such information already? @jackwish, the input data is generally an `Expr` can be `Var` or `IntImm` or some other type of `Expr`. How will i get `in_dtype` from an `Expr`? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-505254571
[dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)
The purpose of this PR is to dive deep into the desing of the quantized ops. To start the discussion I have implemented the Quantize and dequantize op which are easy to implement. There is one more such [PR](https://github.com/dmlc/tvm/issues/2351) but there the conversation has meandered towards implementiaon of quantized convolution. The questions we want to address are 1. Is this design the correct way to incorporate quantized ops. 2. Are the abstraions introduced in this PR appropriate. You can view, comment on, or merge this pull request online at: https://github.com/dmlc/tvm/pull/3457 -- Commit Summary -- * [Relay] [Quantization] WIP - Prototyping Quantize and Dequantize operator with type infer type, lowering and test cases. * [Relay] [Quantization] WIP - Fixing typos and removing redundant code. -- File Changes -- A include/tvm/relay/attrs/nn_quantize.h (67) A include/tvm/relay/quantize_util.h (98) M python/tvm/relay/op/nn/__init__.py (1) A python/tvm/relay/op/nn/_make_quantize.py (20) A python/tvm/relay/op/nn/_quantize.py (73) M python/tvm/relay/quantize/__init__.py (1) A src/relay/op/nn/dequantize.cc (78) A src/relay/op/nn/quantize_op.cc (91) A src/relay/pass/quantize_rewrite.cc (93) A tests/python/unittest/test_quantized_ops.py (117) -- Patch Links -- https://github.com/dmlc/tvm/pull/3457.patch https://github.com/dmlc/tvm/pull/3457.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3457
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)
Closed #3457. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3457#event-2467657739
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)
Rebased to new PR #3512 -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3457#issuecomment-509421969
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
@tqchen @FrozenGene @ZihengJiang @zhiics @wweic @eqy -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#issuecomment-509422639
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
@FrozenGene and @tqchen, any other major comments for the PR? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#issuecomment-510561960
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
> Mainly organizational issues, please make things consistent with what was > discussed in #3531 I have addressed the namespace issues and have followed the same convetion as #3531 in the new commit. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#issuecomment-511043319
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
@liangfu made the changes you suggested. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#issuecomment-512443841
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
Closed #3512. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#event-2510690970
Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)
There are quite a lot of changes here that are depndent on #3531 . I am closing the PR for now. I will reopen this once #3531 is pushed. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3512#issuecomment-515195797
Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)
@jackwish, i want to get my understanding correct, when you say > I was looking into PR #3531 and #3512 , and noticed that the PRs are going to > support 32 bits quantization. are you talking about the inputs or outputs of quantize/dequantize ops being int32? Because, the current implementation for 1. Quantize - limits the inputs to be float32 and output to be (u)i8 2. Dequantize - The input to be (u)int8 and output to be float32 Or are you suggesting we should support higher number of bits (>16) for these ops? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591#issuecomment-515200727
Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)
> > @jackwish, i want to get my understanding correct, when you say > > > I was looking into PR #3531 and #3512 , and noticed that the PRs are > > > going to support 32 bits quantization. > > > > > > are you talking about the inputs or outputs of quantize/dequantize ops > > being int32? Because, the current implementation for > > > > 1. Quantize - limits the inputs to be float32 and output to be (u)i8 > > 2. Dequantize - The input to be (u)int8 and output to be float32 > > > > Or are you suggesting we should support higher number of bits (>16) for > > these ops? > > @shoubhik I was saying to limit to int8. I know your PR only your PR > restricts to int8, while PR #3531 seems trying to enable int8/16/32. I move > to here because I saw the two PRs share same code but seems are not > consistent in quantization approach. Thanks for helping to clarify. Thanks for the explaination. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591#issuecomment-515285922
Re: [apache/incubator-tvm] [DEV] TVM v0.7 Roadmap (#4845)
What is the expected time of release for this release? what are the chances of it happening in May? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/issues/4845#issuecomment-612762262