from:"shoubhik"

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread shoubhik

@FrozenGene a clarifying question to your above comment. If we pass in the 
output scale and shift can we not compute int32-> int8 by simply adding more 
nodes in the graph.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502376311

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-18 Thread shoubhik

> Thanks. Let's lay down the high-level API design for some of the quantized 
> operators. A large portion of this is coming from the following relevant 
> discussions. Thanks to @jackwish, @FrozenGene and @jnorwood for sharing their 
> experiences with quantization, and also @shoubhik for helping design this RFC.
> 
> * 
> [Discussion](https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651)
> 
> Other non-TVM related links that were used to understand quantization
> 
> * GemmLowP - 
> [Doc](https://github.com/google/gemmlowp/blob/master/doc/quantization.md)
> * TFlite reference 
> [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/reference/conv.h#L101-L182)
> 
> **Covered frameworks for now** - TFLite and MxNet
> **Target network for now** - Inception V3 from TFLite. (I will create one for 
> Mxnet)
> **Target platforms for now** - ARM and Intel (will create separate Issue as 
> the project progresses)
> 
> **List of required operators** - quantize, quantized_conv2d, qunatized_relu, 
> quantized_pool2d, quantized_fully_connected, quantized_concat, dequantize
> 
> It will be good if we can agree on Relay ops - its inputs/outputs and the 
> attributes. The initial proposal for the quantize, quantized_conv2d and 
> dequantize ops is as follows (other quantized_* operators will be on the same 
> lines as that of quantized_conv2d)
> 
> ## Op quantize
> ```python
> def quantize(data, scale, zero_point, out_dtype):
> """
> Quantize takes the scale and zero_point attributes and quantizes the 
> FP32 input data to int8/uint8 tensor.
> 
> Parameters
> ---
> data: FP32 tensor
>The input tensor in FP32.
> 
> scale: FP32 scalar (An attribute of the op)
>The float scalar to scale the int8 values back to FP32.
> 
> zero_point: Int32 zero point (An attribute of the op)
>The zero point of the distribution.
> 
> out_dtype: String
>The dtype of the output. Can only be int8/uint8
> 
> Returns
> ---
> quantized_data: int8/uint8 tensor
>The quantized tensor.
> 
> """
> ```
> 
> Key points to discuss
> 
> * The scale and zero_point calculations happen outside the relay graph, i.e., 
> the framework parsers will have to compute the scale and offset if only min 
> and max are provided. [Reference 
> implementation](https://github.com/tensorflow/tensorflow/blob/22e458382d3001a0cda4e594decf175f2387475e/tensorflow/lite/kernels/internal/quantization_util.h#L28-L99)
>  in TFLite. This can also be thought as a framework parser utils where we can 
> handle min/max, symmetric/asymmetric etc and generate the scale and 
> zero_point as frameworks handles them.
> 
> ## Op quantized_conv2d
> ```python
> def quantized_conv2d(quantized_data, quantized_kernel, 
> input_scale, input_zero_point,
> kernel_scale, kernel_zero_point,
> output_scale, output_zero_point,
> out_dtype,
> 
> # All the old remaining ones from conv2d
> strides=(1, 1),
> padding=(0, 0),
> dilation=(1, 1),
> groups=1,
> channels=None,
> kernel_size=None,
> data_layout="NCHW",
> kernel_layout="OIHW",
> out_layout=""):
> """
> 
> Quantize takes the scale and zero_point attributes and quantizes the 
> FP32 input data to int8/uint8 tensor. The scale and zero_point 
> calculations
> happen outside the relay graph, i.e., the framework parsers will have to 
> compute
> the scale and offset if only min and max are provided. 
> 
> Parameters
> ---
> quantized_data: int8/uint8 tensor
>The quantized input tensor in int8/uint8.
> 
> quantized_kernel: FP32 tensor
>The quantized kernel tensor in int8/uint8.
> 
> input_scale: FP32 scalar (An attribute of the op)
>The float scalar to scale the quantized_data int8 values back to 
> FP32.
> 
> input_zero_point: Int32 zero point (An attribute of the op)
>The zero point of the quantized_data distribution.
> 
> kernel_scale: FP32 scalar (An attribute of the op)
>The float scalar to scale the quantized_kernel int8 values back to 
> FP32.
> 
> kernel_zero_point: Int32 zero point (An attribute of the op)
>The zero point of the quantized_kernel distribution.
> 
> output_scale: FP32 scalar (An attribute of the op)
>The output scale is set during the quantization process us

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-24 Thread shoubhik

> > We need to add `in_dtype` in the dequantize op as the calculations will be 
> > different, especially the range to use.
> 
> Guess the input tensor has such information already?

@jackwish, the input data is generally an `Expr` can be `Var` or `IntImm` or 
some other type of `Expr`. How will i get `in_dtype` from an `Expr`?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-505254571

[dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

2019-06-28 Thread shoubhik

The purpose of this PR is to dive deep into the desing of the quantized ops. To 
start the discussion I have implemented the Quantize and dequantize op which 
are easy to implement. There is one more such 
[PR](https://github.com/dmlc/tvm/issues/2351) but there the conversation has 
meandered towards implementiaon of quantized convolution. 
The questions we want to address are
1. Is this design the correct way to incorporate quantized ops.
2. Are the abstraions introduced in this PR appropriate.

You can view, comment on, or merge this pull request online at:

  https://github.com/dmlc/tvm/pull/3457

-- Commit Summary --

  * [Relay] [Quantization] WIP - Prototyping Quantize and Dequantize operator 
with type infer type, lowering and test cases.
  * [Relay] [Quantization] WIP - Fixing typos and removing redundant code.

-- File Changes --

A include/tvm/relay/attrs/nn_quantize.h (67)
A include/tvm/relay/quantize_util.h (98)
M python/tvm/relay/op/nn/__init__.py (1)
A python/tvm/relay/op/nn/_make_quantize.py (20)
A python/tvm/relay/op/nn/_quantize.py (73)
M python/tvm/relay/quantize/__init__.py (1)
A src/relay/op/nn/dequantize.cc (78)
A src/relay/op/nn/quantize_op.cc (91)
A src/relay/pass/quantize_rewrite.cc (93)
A tests/python/unittest/test_quantized_ops.py (117)

-- Patch Links --

https://github.com/dmlc/tvm/pull/3457.patch
https://github.com/dmlc/tvm/pull/3457.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3457

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

2019-07-08 Thread shoubhik

Closed #3457.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3457#event-2467657739

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

2019-07-08 Thread shoubhik

Rebased to new PR #3512 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3457#issuecomment-509421969

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-08 Thread shoubhik

@tqchen @FrozenGene @ZihengJiang @zhiics @wweic @eqy

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#issuecomment-509422639

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-11 Thread shoubhik

@FrozenGene and @tqchen, any other major comments for the PR?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#issuecomment-510561960

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-12 Thread shoubhik

> Mainly organizational issues, please make things consistent with what was 
> discussed in #3531

I have addressed the namespace issues and have followed the same convetion as 
#3531 in the new commit.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#issuecomment-511043319

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-17 Thread shoubhik

@liangfu made the changes you suggested.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#issuecomment-512443841

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-25 Thread shoubhik

Closed #3512.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#event-2510690970

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-25 Thread shoubhik

There are quite a lot of changes here that are depndent on #3531 . I am closing 
the PR for now. I will reopen this once #3531 is pushed.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3512#issuecomment-515195797

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-25 Thread shoubhik

@jackwish, i want to get my understanding correct, when you say
> I was looking into PR #3531 and #3512 , and noticed that the PRs are going to 
> support 32 bits quantization.
are you talking about the inputs or outputs of quantize/dequantize ops being 
int32? Because, the current implementation for
1. Quantize - limits the inputs to be float32 and output to be (u)i8
2. Dequantize - The input to be (u)int8 and output to be float32

Or are you suggesting we should support higher number of bits (>16) for these 
ops?




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-515200727

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-25 Thread shoubhik

> > @jackwish, i want to get my understanding correct, when you say
> > > I was looking into PR #3531 and #3512 , and noticed that the PRs are 
> > > going to support 32 bits quantization.
> > 
> > 
> > are you talking about the inputs or outputs of quantize/dequantize ops 
> > being int32? Because, the current implementation for
> > 
> > 1. Quantize - limits the inputs to be float32 and output to be (u)i8
> > 2. Dequantize - The input to be (u)int8 and output to be float32
> > 
> > Or are you suggesting we should support higher number of bits (>16) for 
> > these ops?
> 
> @shoubhik I was saying to limit to int8. I know your PR only your PR 
> restricts to int8, while PR #3531 seems trying to enable int8/16/32. I move 
> to here because I saw the two PRs share same code but seems are not 
> consistent in quantization approach. Thanks for helping to clarify.

Thanks for the explaination.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-515285922

Re: [apache/incubator-tvm] [DEV] TVM v0.7 Roadmap (#4845)

2020-04-12 Thread shoubhik

What is the expected time of release for this release? what are the chances of 
it happening in May?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4845#issuecomment-612762262

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

[dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

Re: [apache/incubator-tvm] [DEV] TVM v0.7 Roadmap (#4845)

15 matches

Site Navigation

Mail list logo

Footer information