Re: [dmlc/tvm] [RFC][Graph Tuner] Graph level auto-tuning (#1585)

2019-03-29 Thread Zhao Wu
@kevinthesun The performance data using auto tuning or not? According comments, 
seems we don't apply, wish to update the performance data. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/1585#issuecomment-477992195

Re: [dmlc/tvm] [RFC] Frontend layout transformation (#2519)

2019-04-17 Thread Zhao Wu
@tqchen I plan to support TFLite NHWC data layout after my quantization part 
upstreamed.  However, NCHW has its advantages as described.  We could have two 
options:

- Keep NCHW of TFLite and add one parameter named `layout` in `from_tflite`.

`layout` could be `NCHW` or `NHWC`. The default value could be discussed. This 
means we support two data layout in TFLite frontend and leave decisions to 
user. For example, user want to use `NCHW[x]c` schedule or model has 
`conv2d_transpose`, they may want to use NCHW layout.

- Drop NCHW of TFLite and  only original TFLite NHWC layout.

Besides TFLite frontend work, we also should have some work in AutoTVM (support 
NHWC of convolution tuning) and support `spatial pack` NHWC on ARM CPU.

Wish to hear some comments of yours.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2519#issuecomment-484129311

Re: [dmlc/tvm] [RFC] Frontend layout transformation (#2519)

2019-04-18 Thread Zhao Wu
@srkreddy1238 @yzhliu Thanks comments!

If all of you agree, I will make TFLite frontend support from NCHW to NHWC.

@yzhliu Yes. quantization part support is not been upstreamed yet. It has many 
changes. I plan to upstream it in dev 0.6. My original plan is to support 
TFLite NHWC the reason is we could leverage auto tuning of NCHW and see the 
performance of quantization model. The initial work is we could faster than 
FP32 30% in Mobilenet V1 using spatial pack. We also find this is the limit of 
quantization model, we could tensorize `q_conv2d` to get better performance. 
However, if we change the layout from NCHW to NHWC, we should have some 
additional work to do, for example auto tuning of NHWC support (including 
conv2d and depthwise convolution). Alright, I could start to do this work 
firstly to support TFLite NHWC and upstream it before quantization part, 
because this work is much easy than quantization part.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2519#issuecomment-484746869

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-14 Thread Zhao Wu
Hey, @anijain2305 Thanks for your interest. Currently, I am doing 
https://github.com/dmlc/tvm/pull/3141. After that, I will start it. BTW, our 
internal support is based on NNVM and we have completed support it, we have the 
same result compared with TFLite and have better performance than TFLite. 
However, I have to spare some time translating to Relay when to make PR. But I 
have to say that I am busy this month in our product development and it will go 
to open source progress in my company.  I will @ you when that PR is ready.   

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-492482538

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-28 Thread Zhao Wu
@anijain2305 

For the `q_conv2d`, we will add two more arguments.
```python
  output_min=0, 
  output_max=0
```
These will be used for restrict the output range, which could be calculated 
previously. see TFLite's `CalculateActivationRangeUint8` function.

>From my experience, we needn't `q_relu`. But we need `q_add` / `q_concate` and 
>so on. I suggest we use `MobilenetV2` quant model for example, which is used 
>very widely and have common ops we should consider. For example, `depthwise 
>convolution / add / pool and so on`.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-496763873

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Zhao Wu
> > For the `q_conv2d`, we will add two more arguments.
> > ```python
> >   output_min=0, 
> >   output_max=0
> > ```
> > 
> > 
> > These will be used for restrict the output range, which could be calculated 
> > previously.
> 
> I see what you are saying, but I am not sure if this is the right approach. 
> In my opinion, it will be better to put it out of conv. The reason we have 
> these 2 extra min/maxes is because of fused activation in TFLite. It seems 
> better to keep it separate so that both MxNet and TFLite can share 
> quantized_conv2d. In case of TFLite, when we see a fused conv, we can add one 
> more clamp operator in the sequence of ops at the end.

No matter whether we have fused activation function, we always need output_min 
/ output_max. Because we will get conv int32 result, but we will need uint8 
result. So we must restrict int32 to uint8. If we don't have fused activation 
function, (When we have quantized model of TFLite, we don't have fused 
activation many cases), the output_min / output_max will be 0 / 255 to restrict 
int32 result. If we have relu6, output_min / output_max will be 0 / 6. So I 
think we are better put these two into conv argument. And we could avoid 
producing another clamp, just be calculated in conv2d requantize int32 -> uint8 
process and it is nature.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-497031857

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Zhao Wu
> > > > For the `q_conv2d`, we will add two more arguments.
> > > > ```python
> > > >   output_min=0, 
> > > >   output_max=0
> > > > ```
> > > > 
> > > > 
> > > > These will be used for restrict the output range, which could be 
> > > > calculated previously.
> > > 
> > > 
> > > I see what you are saying, but I am not sure if this is the right 
> > > approach. In my opinion, it will be better to put it out of conv. The 
> > > reason we have these 2 extra min/maxes is because of fused activation in 
> > > TFLite. It seems better to keep it separate so that both MxNet and TFLite 
> > > can share quantized_conv2d. In case of TFLite, when we see a fused conv, 
> > > we can add one more clamp operator in the sequence of ops at the end.
> > 
> > 
> > No matter whether we have fused activation function, we always need 
> > output_min / output_max. Because we will get conv int32 result, but we will 
> > need uint8 result. So we must restrict int32 to uint8. If we don't have 
> > fused activation function, (When we have quantized model of TFLite, we 
> > don't have fused activation many cases), the output_min / output_max will 
> > be 0 / 255 to restrict int32 result. If we have relu6, output_min / 
> > output_max will be 0 / 6. So I think we are better put these two into conv 
> > argument. And we could avoid producing another clamp, just be calculated in 
> > conv2d requantize int32 -> uint8 process and it is nature.
> 
> In the case the activation is not fused, the values have to clamped to 0/255 
> or uint8 range, which is basically the out_dtype. So, we do not need any 
> extra information for the quantized_conv2d for going back to uint8/int8 other 
> than out_dtype. Correct?
> 
> Now, If the activation is fused, I agree that we will have two clamps now. 
> One inside the quantized_conv2d (0/255), and one for the relu6 (0/6). I think 
> this is fine. We can also write a Relay that replaces two back-to-back 
> clamping with one clamp Relay operator.
> 
> The reason I am saying this is that TFLite chooses one way to handle things, 
> which other frameworks might not. So, it is necessary to come up with right 
> abstractions first. The performance can be then be achieved by writing Relay 
> passes.

Yes, I agree when we don't have activation, we don't need anything. However, 
Another thing we should consider: How to integrate with other libraries, such 
as QNNPACK. QNNPACK also need output min / output max too. 
https://github.com/pytorch/QNNPACK/blob/master/include/qnnpack.h#L62-L63


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-497074984

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-07 Thread Zhao Wu
@tqchen We are very busy at our one internal project this period. I will talk 
with @jackwish next Monday. However, sending the proposal maybe should wait us 
finishing this project. Sorry for that.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-499817203

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-13 Thread Zhao Wu
@anijain2305 see the code quickly and I know your thought (combine operator to 
complete q_conv2d). However as commented before, how do we integrate with 
qnnpack when we don't have output_min / output_max? I think we could have these 
two arguments, if mxnet don't have, we could leave them the default values. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-501962808

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread Zhao Wu
@anijain2305 I understand your thought and thought. I agree we should make the 
api minimal. However, no matter what way, q_conv2d’s int32 output should be 
clamped into uint8 range. If you don’t pass min / max, you also need do `output 
= std::max(output, 0)` and `output = std::min(output, 255)` then return output. 
So why not we set the default the value output_min = 0 / output_max = 255, and 
make the computation be `output = std::max(output, output_min)` and `output= 
std::min(output, output_max)` which will be suitable for tflite / mxnet / 
qnnpack and so on... API design is very important, we should consider as far as 
we could(tflite / mxnet , even other library we should also consider, qnnpack 
is a very high performance library on arm cpu, we can not avoid discussing it 
in my opinion), otherwise we have to do tricky workaround in the future when we 
do something. This is my point I wish to express before.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502366780

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread Zhao Wu
> @FrozenGene a clarifying question to your above comment. If we pass in the 
> output scale and shift can we not compute int32-> int8 by simply adding more 
> nodes in the graph.

doesn't understand your comment fully. do you mean could we avoid int32 -> int8 
computation? If so, I think we can not. We need `requant` operation (int32 -> 
int8) 
(https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/internal/reference/conv.h#L171)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502417024

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread Zhao Wu
> > It appears to me this would let them simulate smaller than 8 bit 
> > quantizations.
> 
> If _simulating 8 smaller bit_ is the case, 8 bit should be able to hold 
> activation min/max value.

8 bits could hold. But what the value output_min / output_max is ? I think 
@jnorwood want to express this point. Because we can not just simply use 
`out_dtype` to decide what the value range is.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502417146

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread Zhao Wu
> @FrozenGene For the output_min and max, isn't the out_dtype enough? If its 
> uint8, we can clamp at 0 and 255. If its int8, we can clamp at -128 and 127. 
> I don't see any reason the values will be any different, unless you want to 
> fuse the quantized relu in the quantized convolution from the starting 
> itself. Please let me know if I am understanding something wrong. I think we 
> should not fuse operators in the frontend and let Relay graph fusion take 
> care of that.
> 
> Let's see what others think about this. @tqchen @yzhliu @ZihengJiang What are 
> your thoughts on this?

I think it is ok. If we do this way,  we should insert one clamp if we have 
activation.
Like our tflite frontend
```python
# If we have fused activations
if fused_activation_fn != ActivationFunctionType.NONE:
   if weight_tensor_type == TensorType.UINT8:
# implement this function  
output_min, output_max = 
self.calculate_activation_range_uint8(output_scale, output_zero_point, 
fused_activation_fn)
# insert clip
out = _op.clip(out, output_min, output_max)
out = self.convert_fused_activation_function(out, fused_activation_fn)
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502418630

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> > Although the quantized conv result is held in uint8, it could be static 
> > casted to signed int8, or even fewer than 8 bit quantization. That would 
> > require both min and max saturations, as in the reference tflite quantized 
> > conv implementation
> 
> Ah, I see. That finally makes sense.
> So, this is not about activation. This is about what representation one is 
> using for storing the floating point values. For example, if it is 7-bits, we 
> will need the output min/max saturations. Cool, I will add them into the API 
> and add corresponding documentation.

See @jackwish 's comment. As my code `calculate_activation_range_uint8` means, 
only when no activation, we will have the range of data type. i.e. if we don't 
have activation, we will have 0 - 255 if it is uint8. If we have RELU6, we will 
have 
https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/kernel_util.cc#L152

So, how about if we are 7-bits, alright, we could also use 8 bits to represent 
output_min / output_max in conv's compute kernel. i.e. the output_min / 
output_max is 0 / 255. But in our frontent, we will be like this:
``` python
# If we are 7-bits
  if weight_tensor_type == TensorType.UINT7:
 # implement this function  
 output_min, output_max = 
self.calculate_activation_range_uint7(output_scale, output_zero_point, 
fused_activation_fn)
 # insert clip
 out = _op.clip(out, output_min, output_max)
```

That is to say no matter whether we have activation, we will have one `clip`. 
If no activation, we will clamp it to 0 / 127. Because we represent it in 0 / 
255, this is 8 bits range. If we have activation, for example, RELU6, the code 
will change also 
https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/kernel_util.cc#L152:
```cpp
   *act_min = std::max(qmin, quantize(0.0));
*act_max = std::min(qmax, quantize(6.0));
```
q_min is 0, q_max is 127. 

So, if we decide insert `clip` operator in frontend, we could handle fewer 8 
bits too. 

One potential optimization :
If TVM support data type like UINT7, we could do the logic like UINT8, which 
means we could avoid inserting `clip` operator in frontend if we have no 
activation (just set out_dtype be UINT7). But however, i think it shouldn't be 
the bottleneck.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502515514

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> `https://arxiv.org/pdf/1803.08607.pdf`

Qualcomm's Way? Let us see the Google's TFLite model:
![image](https://user-images.githubusercontent.com/7287321/59577624-0f541000-90f7-11e9-9044-2153d6f9ccda.png)

We have the quantized model doesn't remove RELU6 in dw conv / conv. I think we 
should focus on the TFLite's code / TFLite's way.

Come back to Qualcomm's paper, if we decide to support that way, we could also 
write logic in frontend and insert correct `clip` operator. However, I think we 
have no obvious reason to support this way.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502526450

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> In that case min and max values passed into the quantized conv are always 0 
> and 255.

Not true. When there is activation, the range is not always 0 ~ 255. For 
example RELU, 
```cpp
 auto quantize = [scale, zero_point](float f) {
return zero_point + static_cast(TfLiteRound(f / scale));
 };
*act_min = std::max(qmin, quantize(0.0));
*act_max = qmax;
```
We have proved that compute as this way and could make the result the same as 
TFLite.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502532416

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Zhao Wu
> > Not true. When there is activation, the range is not always 0 ~ 255. For 
> > example RELU,
> 
> I believe tflite extends the quantization range so it always includes 0, as 
> done in the gemmlowp quantization example below. I have dumped my min and max 
> saturation input values from the six quantized tflite models (two mobilenets 
> and four inceptions). They are all 0 and 255.
> 
> `https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc`
> 
> ```
> // Given the min and max values of a float array, return
> // reasonable quantization parameters to use for this array.
> QuantizationParams ChooseQuantizationParams(float min, float max) {
>   // We extend the [min, max] interval to ensure that it contains 0.
>   // Otherwise, we would not meet the requirement that 0 be an exactly
>   // representable value.
>   min = std::min(min, 0.f);
>   max = std::max(max, 0.f);
> ```

I think you maybe don't understand fully of my previous comment. One question I 
want to ask: Do your quantized models have conv + relu / relu6 like our model? 
If no, obviously is 0 ~ 255, no matter how many models are. Please see: 
https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/kernel_util.cc#L138
  I and @jackwish have emphasized many times of this function code.

Please construct one quantized model like us:
![image](https://user-images.githubusercontent.com/7287321/59581062-36660e00-9106-11e9-93c1-2953571766f8.png)

I can make sure you will observe another result.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502543075

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-18 Thread Zhao Wu
> > I think you maybe don't understand fully of my previous comment. One 
> > question I want to ask: Do your quantized models have conv + relu / relu6 
> > like our model? If no, obviously is 0 ~ 255, no matter how many models are. 
> > Please see: 
> > https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/kernel_util.cc#L138
> >  I and @jackwish have emphasized many times of this function code.
> 
> The quantized mobilenet v1 inference model is from the tflite model 
> repository. The training model includes relu6 and batch normalization 
> operations, but these are fused into convolution operations in the inference 
> model, as the Netron diagram shows.
> 
> The link you reference shows floating point activation values that would be 
> applied during training. They do represent the range bound that would be 
> expected of the upscaled values in the accumulator in the inference model. 
> However the min and max saturation values passed into the inference quantized 
> convolution are applied _after downscale_ ... I previously provided the code 
> and the link. They are int32 values, not float values. They are applied after 
> both downscale and offset are applied. They are 0..255 even though the scaled 
> up range expected is 0..6 from the fused-in relu6 operation.
> 
> If the convolution and relu operations were separate, you would still see 0 
> and 255 for those min and max values because they are applied after downscale 
> and after offset are applied to the convolution accumulator. The min and max 
> values only function to saturate the downscaled result to the quantized uint8 
> bit range, avoiding wrap-around overflow/underflow of the 8 bit value if the 
> downscaled accumulator were simply masked to 8 bits.

I have emphasized the model diagram is one `quantized` model. Let me show more 
detail of the property: 
![image](https://user-images.githubusercontent.com/7287321/59662050-b95a9780-91de-11e9-89c2-b252b8b3a8ae.png)
This is to say, not all relu / relu6 can be fused into convolution in TFLite's 
`quantized` model. Then what the min / max, that is previous code  
https://github.com/tensorflow/tensorflow/blob/v2.0.0-beta1/tensorflow/lite/kernels/kernel_util.cc#L138
 does. `NOT` just simple 0 ~ 255. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-502986425

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-05 Thread Zhao Wu
@anijain2305 Generally Good. About the performance of HW, let us say ARM CPU, 
For the depthwise convolution, we even could optimize without tensorize. After 
some work of optimization for int8 using pure TVM schedule without tensorize, 
we could also beat QNNPACK (some workload we test we even could beyond 50%).

 However, for normal convolution, without tensorize, it is hard to achieve best 
performance. When we use tensorize, one thing is that we combine `bias_add` 
into `qnn.conv2d` to avoid memory access. As @jackwish 's previous 
investigation, we find it is very important on ARM CPU's performance. So, if we 
implement it as the diagram, I only concern this thing.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-508657963

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-05 Thread Zhao Wu
@tqchen, if we use avg_pool2d , we also need to modify it. But the modified 
code is not much. For example, we should make the sum UInt8 result be Int16 to 
avoid overflow. In our internal implementation, we use q_avg_pool2d to 
distinguish avg_pool2d. Relu shouldn’t be modified. However, if we have 
activation fns, we should have output_min / output_max calculated by 
calculate_activation_range_uint8 said before, then we insert clip operator. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-508894783

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-07 Thread Zhao Wu
Let me try to follow your discussion compared with our internal implementation, 
for round (of requantize), when we get the `input_scale` / `kernel_scale` / 
`output_scale`, we want to get the `shift` / `multiplier` (See: 
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/quantization_util.cc#L52
 ) and pass to real `requantize` (i.e. corresponding to 
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/internal/common.h#L148)
 . 
 If we decide to do it in Python frontend like our internal implementation 
(we use Python to handle it and pass the `shift` / `multiplier` to previous 
real requantize, we should notice Python3's round is `rounds to the nearest 
"even" number`, not `round away from zero` like C++ std::round. We could 
implement it very easily:
```python
def _round_away_from_zero(value):
abs_value = abs(value)
result = math.floor(abs_value) + math.floor(2 * (abs_value % 1))
return result if value >= 0 else -result
```
Maybe your discuss of round is here, right? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-509075561

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-07 Thread Zhao Wu
> > slight difference in a single point(0.5) is fine and likely won’t have an 
> > impact on final acc
> 
> Yeah, I was planning to add a rounding param to the op. For "ceil", we could 
> just add a 0.5 rounding without worrying about negative values. For "round', 
> we can be more precise. By default, we can choose "ceil". What do you think?
> 
> Update - Maybe not, "ceil" is confusing. Let me think and come up with better 
> terms (like round-away-from-zero etc.).

If your round is the concept of my previous comment, maybe `round` is better 
and is the same as TFLite. IMO, if we couldn't get the same result of TFLite, 
we can not know where is wrong when the model is large and we could have 
problem when we deploy it in industry environment. Because algo. team often 
verify acc in TFLite, not verify the acc in TVM.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-509077139

Re: [dmlc/tvm] [RFC] Initial support for Tflite operator SPLIT (#3520)

2019-07-10 Thread Zhao Wu
Thanks @u99127 LGTM.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3520#issuecomment-510035031

Re: [dmlc/tvm] [RFC] Initial support for Tflite operator SPLIT (#3520)

2019-07-10 Thread Zhao Wu
@u99127 Please help to make the CI green. Seems your op test causes that.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3520#issuecomment-510308397

Re: [dmlc/tvm] [RFC] Initial support for Tflite operator SPLIT (#3520)

2019-07-19 Thread Zhao Wu
@u99127 Could you modify PR as my suggestion? I think it will work now. Thanks.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3520#issuecomment-513176720

Re: [dmlc/tvm] [RFC] Initial support for Tflite operator SPLIT (#3520)

2019-07-19 Thread Zhao Wu
Thanks @u99127 LGTM now.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3520#issuecomment-513425857

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-22 Thread Zhao Wu
@anijain2305 Let me look at it afternoon of today or evening. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-514031783

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-22 Thread Zhao Wu
@anijain2305 Could we also list the api of TFLite and QNNPACK? I think both 
should be considered. Because we will parse TFLite model and QNNPACK is an good 
quantization accelerated library

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-514070990

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-23 Thread Zhao Wu
@anijain2305 Could we list the ConvParams of TFLite? I think it is more clean 
what TFLite's convolution  computation need.

For the diagram, I think maybe we could avoid the intermediate `clip`. Because 
we have one `clip` in `requantize` operator. If we accept one `output_min` / 
`output_max`, then we just do one `clip` in `requantize`, no intermediate 
`clip`.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591#issuecomment-514457137

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-07-29 Thread Zhao Wu
I think lowering in the Python make sense.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-516232138

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-29 Thread Zhao Wu
> > **Covered frameworks for now** - TFLite and MxNet
> > **Target network for now** - Inception V3 from TFLite. (I will create one 
> > for Mxnet)
> > **Target platforms for now** - ARM and Intel (will create separate Issue as 
> > the project progresses)
> 
> A quick question here since I can't see this mentioned on #3591
> 
> Is this network going to be quantized per tensor as well as the new 
> per-channel quantization that is appearing in tflite 2.0 ? IIUC, tf1.13 has 
> per tensor quantization rather than the per channel quantization. i.e. more 
> interestingly can the relay design support both ?
> 
> https://www.tensorflow.org/lite/performance/quantization_spec?source=post_page---#per-axis_vs_per-tensor
> 
> regards
> Ramana
> 
> regards
> Ramana

Good question. We have only supported TF1.13 quantization. TF2.0 has separate 
scale and doesn't be considered in previous discussion. Seems there is a gap 
here. cc @anijain2305  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-516232910

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-08-01 Thread Zhao Wu
If we have time, we could investigate why we couldn't achieve 252GFlops even 
more. Only 73% hardware efficiency means we have much work could dive.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3388#issuecomment-517196006

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-08-01 Thread Zhao Wu
For TFLite model's average_pool / max_pool, which doesn't have 
input_zero_point. So, for TFLite, we don't need to subtract zero_point. MXNet 
have?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3617#issuecomment-517522729

Re: [dmlc/tvm] [RFC][DEV] TVM Project Repo Migration (#4212)

2019-11-03 Thread Zhao Wu
Does it affect developers? Especially the pull request in progress. Because the 
git remote url is changed.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4212#issuecomment-549244436

Re: [apache/incubator-tvm] [DEV][DRAFT] TVM v0.6 Release candidate (#4259)

2019-11-08 Thread Zhao Wu
Though C++ RPC haven't been reviewed, but personally, I wish C++ RPC 
https://github.com/apache/incubator-tvm/pull/4281 could enter into 0.6, which 
is a very useful feature in embedded devices, we almost use it in our 
production development every day. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4259#issuecomment-551657497

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

2019-11-13 Thread Zhao Wu
cc @sf-wind  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-553691132

Re: [apache/incubator-tvm] [DEV][DRAFT] TVM v0.6 Release candidate (#4259)

2019-11-20 Thread Zhao Wu
do we have the deadline now? because i have one pr of thread performance #4344 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4259#issuecomment-555898203

Re: [apache/incubator-tvm] [RFC] Apache TVM 0.6.0 Release Candidate (verifying, feedback, etc.) (#4406)

2019-11-24 Thread Zhao Wu
one typo... [RUTNIME] -> [RUNTIME]

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4406#issuecomment-557994453

Re: [apache/incubator-tvm] [RFC] Asymmetric padding for convolution (#2682)

2019-12-10 Thread Zhao Wu
Yes. We should keep the backward compatibility.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/2682#issuecomment-564353521

[apache/incubator-tvm] [RFC] Module based Model Runtime Interface (#5038)

2020-03-10 Thread Zhao Wu
As https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025 
discussed, we want to support Module based Model Runtime Interface and solve 
the following challenges:

- R1: The creation of ML model can be context dependent. For example, the user 
needs to be able to specify which GPU to run the graph runtime on.
- R2: We start to have multiple variations of model runtime, such as RelayVM. 
While it does not makes sense to force all model runtime to have the same set 
of APIs, it would be helpful to have a same mechanism for packaging and loading.
- R3: In advanced use cases, we want to be able to bundle multiple models into 
a single shared library.

After discussion, we have sorted out the API and reach an agreement.  Here, I 
want to summary the API and give it an example.

```python
# lib is a GraphRuntimeFactoryModule
# that contains json and parameters
lib = relay.build(...)

# we could export it to shared library and load it back
# Here, we provide one option to let user control whether we
# want to package_params or not. The default value is true.
lib.export_library("resnet18.so", package_params=true)

# load it back
lib = tvm.module.load("resnet18.so")

# Call into the factory module to create a graph runtime
# Having this additional factory create step solves R1
# Note that parameters are already set

# The first argument is a key that helps to solve R3, allow list of context in 
the future
# gmod = lib["resnet18"]([tvm.cpu(0), tvm.gpu(0)])
gmod = lib["resnet18"](tvm.cpu(0))

set_input = gmod["set_input"]
run = gmod["run"]
get_output = gmod["get_output"]

# We do not need to set the parameters here
# as the models
set_input(data=my_data)
run()
get_output()

# we could use wrapper
gmod = tvm.graph_runtime.create(lib["resnet18"], tvm.cpu(0))
gmod.set_input(data=my_data)
gmod.run()
gmod.get_output()
```

More details and the decision procedure could be seen: 
https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/5038

Re: [apache/incubator-tvm] [RFC] Module based Model Runtime Interface (#5038)

2020-05-30 Thread Zhao Wu
> @FrozenGene can we follow up on this?

Hi, @tqchen I will start to work on it from next Monday! Sorry for working on 
it lately because of other things.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/5038#issuecomment-636294461

Re: [apache/incubator-tvm] [RFC] Module based Model Runtime Interface (#5038)

2020-06-09 Thread Zhao Wu
Draft pr: https://github.com/apache/incubator-tvm/pull/5753 There is still many 
things to do but will update it next in this pr. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/5038#issuecomment-641354644

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-10 Thread Zhao Wu
Thanks for the great work! I have some quick question:
1. Have you tested various models arm cpu? (like A53, A72, A55, A75 and so on). 
According to fb qnnpack blog, it is not always could get best performance using 
`umul` compared with `smlal` instruction (used by now). 
(https://engineering.fb.com/ml-applications/qnnpack/). So just change 
legalization and give it up `smlal` instruction in aarch64 maybe doesn't make 
sense to me. One proof:  our coming feature `Ansor` (auto scheduler) doesn't 
support tensorize (at least till now), however, it could get nice performance 
using `smlal` instruction and beyond TFLite 1.2X on mobilenet v2 quantized 
model (cortex-a53) 
(https://discuss.tvm.ai/t/tflite-and-tvm-comparison-for-quantized-models/6577/4).
 I mean here:
```python
@qnn_conv2d_legalize.register('arm_cpu')
def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types):
# ARM prefers the dtypes to be same.
if is_aarch64_arm():
return helper_change_dtypes_to_be_same(attrs, inputs, types, 
relay.qnn.op.conv2d)
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, 
relay.qnn.op.conv2d)
```
It disables us using `SMLAL` instruction.

2. I suggest we keep two schedules (tensorize and default spatial pack). Not 
just check `aarch64` and only use tensorize template. I mean here:
```python
is_aarch64 = "aarch64" in str(isa.target)
if is_aarch64 and data.dtype in ["int8", "uint8"]:
strategy.add_implementation(
wrap_compute_conv2d(topi.arm_cpu.compute_conv2d_NHWC_quantized),
wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_NHWC_quantized),
name="compute_conv2d_NHWC_quantized.arm_cpu")
else:
strategy.add_implementation(
wrap_compute_conv2d(topi.arm_cpu.conv2d_nhwc_spatial_pack),
wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_nhwc_spatial_pack),
name="conv2d_nhwc_spatial_pack.arm_cpu")
```

This is our design purpose of strategy. I suspect there is some workload our 
spatial pack could perform better. This situation is the same as Winograd, we 
could perform winograd and default template and choose better. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642374967

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-10 Thread Zhao Wu
cc @ajtulloch 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642375802

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
Glad to see we have the same thought we should let autotvm select the best.

Autoscheduler reley on the legalization pass to generate smlal inst(After auto 
scheduler is released, let us make it better together.) One information I 
missed before, my testing rasp 3b+ os is Ubuntu 64 bits, not 32 bits, so the 
target is aarch64 too. 

I mention auto scheduler is not to question your work (your work is very 
great!) and is orthogonal as you said. I just mention that we use smlal inst on 
A53 (aarch64 os mentioned before) we could get nice performance too. So I want 
to know on low-end arm cpu, whether smlal is better than this (as fb qnnpack 
blog said: The default microkernel uses the fewest possible instructions and 
thus delivers the best performance on low-end cores, which can execute only one 
NEON instruction per cycle.).

So I wish we could test several arm cpus to proove our this work work well all 
aarch64 cores (low-end core, high-end core).

Secondly, I suggest let us test mobilenet v2 too. To see that whether our pr 
could work well across various models. 

Your work is very great but I wish let us use more data and result to make it 
more convincing. 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642541198

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
> 1. It will be hard to do this. The point is that the legalization is done in 
> Relay before picking the strategy (thus, it is unaware of the strategy 
> picked). To keep both legalizations I need somehow to pass information from 
> the strategy (e.g., the name of the algorithm, or something like that). Are 
> you aware of any other ways I can do it?

@giuseros I think add the algorithm name could be one way to handle it. For 
example, we could add it in the `attr` and query it in the legalization pass, 
then we could throw it safely.

> Note that I am targeting NHWC layout. I wasn't able to even compile with 
> conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it 
> without auto-tuning on Armv8-A). I gathered from various discussions that 
> NHWC support for arm targets is incomplete at the moment. So for now we might 
> simply agree to leave this as default for NHWC and conv2d_nchw_spatial_pack 
> as default for NCHW and mirror that in the legalization step which might look 
> like:
```if is_aarch64_arm() and attrs.data_layout == "NHWC":
return helper_change_dtypes_to_be_same(attrs, inputs, types, 
relay.qnn.op.conv2d)
return helper_no_fast_int8_hw_legalization(attrs, inputs, types, 
relay.qnn.op.conv2d)```

Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful 
testing, NHWC is also perform better than NCHW on arm cpu using Ansor (aka auto 
scheduler) too. So this prompts us we could improve our AutoTVM NHWC schedule 
on arm cpu too.  As the result I show in the post,  we use auto schedule is to 
leverage NHWC layout and `smlal` instruction,  I prefer we could leverage 
`attr[algorithm_name]` mentioned previous to keep `smlal` instruction. After 
auto scheduler released (we are working hard to do it, we wish after 2 weeks we 
could bring it in), we could see how to improve it (like generating smlal and 
smlal2 or your tensorize instruction), they are orthogonal but they share the 
same legalize pass.

One background of auto scheduler: In auto scheduler, we only need tvm.compute, 
then we could generate schedule automatically, so we could try NHWC / NCHW 
easily. So there is no spatial pack schedule template concept in the auto 
scheduler world in fact.

> About the Raspberry+mobilenet v2, good to know you are working on Armv8-A 
> (sorry to have assumed otherwise). However, there is still the point that 
> mobilenet uses shallow convolutions, while I am addressing deeper and more 
> generic convolutions.

So we should keep both algorithm better, right?


> Are you saying that, as things stand now in TVM, the conv2d_nhwc_spatial_pack 
> schedule might be faster than the gemm approach on smaller CPUs? 
> Unfortunately, for now I don't think they can be added together because of 
> what I said above about the legalization step. Do you know any work-around to 
> that? Maybe I can legalize only for specific devices (e.g., only for 
> Cortex-A55)?

I think add algorithm name mentioned before maybe could help to solve it.

> Finally, as things stand now we might get this PR in, and later do a more 
> detailed comparison across different networks + CPUs

Ok. I buy it in. After legalization pass we discussed is solved, I am glad to 
do code review carefully and handle this pr.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642577581

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
> Hi @FrozenGene ,
> 
> The idea of adding the algorithm name to the attributes would work if the 
> legalization step was run after we pick the strategy. It is instead run 
> before, so it is unaware of the strategy picked. 
> 
> 
> 
> Maybe we could add a new pass that runs based on the strategy? Or we can hack 
> in `_alter_conv2d_layout`? 
> 
> 

@giuseros what you mean run based on the strategy?

in alter_op_layout, we could extract workload[0] to get strategy, however could 
you help me to double check whether our autotvm tuning will use alter_op_layout 
pass?(i.e. O3), I have forgot a little bit. If so, maybe we could change the 
dtype here according to strategy. cc @anijain2305 any better idea too?


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642601817

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
> So I mean to add a `convert_data_type` pass that is similar to 
> `alter_op_layout` but converts datatype (and we can do something like `if 
> topi_impl == 'spatial_nhwc' converts to int16`.

I think this is one interesting pass. Like we have `_alter_op_layout` and will 
have different logic for different strategy , then we have `_alter_op_dtype` 
pass and will have different logic for different strategy. 

However, this pass seems do most of the same thing in legalize (change dtype).  
So our legalization pass should complete this work according to different 
strategy. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642651252

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
@giuseros  I suddenly think of auto scheduler will have one environment value. 
So the change of legalization won't affect auto scheduler. We could check the 
value of this environment value for auto scheduler and use `smlal`. However, 
this problem I think we still should resolve that we should have the ability 
for allowing different strategies have different logic. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642671388

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-11 Thread Zhao Wu
> Hi @FrozenGene ,
> I agree that different strategies should be available to the auto-tuner. See 
> if the solution proposed is good enough for you (at least as a temporary 
> work-around). For Armv7-A or NCHW, nothing changes, we follow exactly the 
> previous path.
> 
> For Armv8-A and NHWC we don't convert during the legalization step, but 
> during the `_alter_conv2d_layout` pass. The only difference now is that the 
> offset contribution will be added after the convolution instead than before.
> 
> I agree that a better solution, where the legalization changes depending on 
> the strategy, would be better. However, I don't think the legalization step 
> has got enough information to know the strategy (for now).
> 
> What do you think?

I think it is ok.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642690898

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-12 Thread Zhao Wu
> Hi @FrozenGene ,
> I gave it another go, but switching legalization on the strategy seems very 
> hard (since we would need the auto-tuner to pick the best data-type for us).
> 
> So for now, we have to content with the `_alter_conv2d_layout` workaround and 
> try to think a bit more on how we can infer the strategy during legalization

I think I could accept this way.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-643286046

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-17 Thread Zhao Wu
> @FrozenGene Can you please review when you get time?

Yep. I could review it tomorrow.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-645406262

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-22 Thread Zhao Wu
> Hi @FrozenGene , @anijain2305 ,
> Any update on this review?
> Also, is there a way to retrigger the tests? Or should I contact someone in 
> particular?
> 
> Thanks

for the CI, maybe you could force trigger it or you could comment it (and 
contact @jroesch ) and explain the reason? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-647615333

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-22 Thread Zhao Wu
@anijain2305 could you have a look another round?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-647889871

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-22 Thread Zhao Wu
Merged #5754 into master.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#event-3471174912

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-22 Thread Zhao Wu
Thanks @giuseros @anijain2305 MERGED NOW.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-647914319

Re: [apache/incubator-tvm] [COMMUNITY] jcf94 -> Reviewer (#6241)

2020-08-10 Thread Zhao Wu
Merged #6241 into master.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/6241#event-3637878357

Re: [apache/incubator-tvm] [VOTE] Release Apache TVM (incubating) v0.7.0.rc0 (#6622)

2020-10-05 Thread Zhao Wu
+1, I checked

- signature and hash
- LICENSE, DISCLAIMER and NOTICE
- Version
- Code compiles


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/6622#issuecomment-703569247

Re: [apache/tvm-rfcs] [RFC][Frontend] Add a PaddlePaddle Frontend (#19)

2021-08-08 Thread Zhao Wu
@jiangjiajun Thanks for bringing this rfc, which will help TVM to serve more 
conveniently for paddlepaddle users. This rfc is good generally speaking. 
However, there are many spell grammar error so that I suggest you could do one 
round of proof-reading 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/19#issuecomment-894974345

Re: [apache/tvm-rfcs] [RFC][Frontend] Add a PaddlePaddle Frontend (#19)

2021-08-12 Thread Zhao Wu
Merged #19 into main.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/19#event-5153185886

Re: [apache/tvm-rfcs] [RFC][Frontend] Add a PaddlePaddle Frontend (#19)

2021-08-12 Thread Zhao Wu
@jiangjiajun Please go ahead to handle pr.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/19#issuecomment-898096998

Re: [apache/tvm] [Release] v0.8 Release Planning (#8976)

2021-09-13 Thread Zhao Wu
> Agree with @leandron that we could firstly refer to the items there. Many 
> "initial" features in v0.7 are now stable. For example:
> 
> * Initial automatic scheduling support -> stable.
> 
> * Initial command line driver interface -> stable.
> 
> * Intial Hexagon support -> stable.
> 
> * Bring your own codegen (BYOC) support -> now we have several backends.
>   
>   * [stable] NVIDIA TensorRT, Xilinx Vitis-AI, ARM compute library, ARM 
> Ethos-N, etc.
>   * [experimental] TBA.

Does our hexagon support is stable now? I am not sure about it. As I saw we 
still pull requests actively(like https://github.com/apache/tvm/pull/8986 to 
support model lauch). I think the status maybe is still not stable. 
@kparzysz-quic should give more definitive answer

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8976#issuecomment-917932363

Re: [apache/tvm-rfcs] [RFC][OpenCLML] OpenCLML integration as BYOC (PR #52)

2022-01-26 Thread Zhao Wu
LGTM. I don't have any other comments.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/52#issuecomment-1022872136
You are receiving this because you are subscribed to this thread.

Message ID: 

Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

2022-10-03 Thread Zhao Wu
Based on our experience at NIO, dynamic shape support in Relax is **extremely** 
important for us. In fact, we have done many things on Relay trying to cover 
dynamic shape support on our user cases, however lack of first class support 
for symbolic dynamic shape still constraint us some ops / patterns can not 
exist on models. First class support for symbolic dynamic shape is  
**extremely**   **extremely** important for us, especially some model is 
essentially dynamic input / output, for example Point Cloud. Relax, this is 
what I've been expecting for so long. If we have Relax, for Point Cloud or 
Object Detection model's dynamic output / Dynamic Batch model, Relax could 
solve it perfectly (whether from the view of functionality or performance). 

Anyone has doubt I recommend to read this: 
https://github.com/tlc-pack/relax/wiki/Relax-Shape-Computation-Design. 

Thanks @YuchenJin @tqchen and many relax authors to bring it, I very appreciate 
this work 👍 and in fact, I am evaluating relax internally and want to let relax 
to solve our problem ASAP. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1266326396
You are receiving this because you are subscribed to this thread.

Message ID: 

Re: [apache/tvm] [VOTE] Establish TVM Unity Connection Technical Strategy (Issue #12651)

2022-10-03 Thread Zhao Wu
+1 (binding)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/12651#issuecomment-1266327470
You are receiving this because you commented.

Message ID: 

Re: [apache/tvm] [VOTE] Clarify Community Strategy Decision Process (Issue #15521)

2023-08-10 Thread Zhao Wu
+1

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/15521#issuecomment-1674102077
You are receiving this because you are subscribed to this thread.

Message ID: 

Re: [apache/tvm-rfcs] [RFC] Relax Upstreaming (PR #89)

2023-09-13 Thread Zhao Wu
I want to know do we have plan to decide when to merge Unity branch into main 
branch? As LLM is so popular now, without Unity, we can not support it well. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1717067037
You are receiving this because you are subscribed to this thread.

Message ID: 

[TVM Discuss] [Development] Upstreaming tensorize implementation

2019-04-12 Thread Zhao Wu via TVM Discuss


What hardware platform are you working for? For tensorize, you could add one 
compute / schedule template like ‘winograd_nnpack_fp32’ .

For tensorize, the external library is a must? Or could you just include the 
content we really need? In my opinion, tensorize microkernel should be just one 
file enough, we could have 4x8 / 8x8 ukernel and so on. If the external library 
is must, you should add src/contrib like NNPack.





---
[Visit 
Topic](https://discuss.tvm.ai/t/upstreaming-tensorize-implementation/2199/2) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/8982d8058502eb8030e86d1ef65783abbab90d29d2b314f65b7080e16638ac2e).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=NETdUm1-RLIjxmgdDVTs_g2

[TVM Discuss] [RFC] Introducing Hexagon backend

2019-05-02 Thread Zhao Wu via TVM Discuss


However, in GetTempAllocaAlignment function, it will reduce alignment / 2 in 
while loop. I am worried about this will affect DSP’s rule, it requires 128 
bits alignment.





---
[Visit Topic](https://discuss.tvm.ai/t/introducing-hexagon-backend/2421/7) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/7985e1cd671f904d039b0a44572f8daa5d12fe54fc327ca4d789c26c60e03a07).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=vs5qZAvPA_ZzqyGhE7w_1Q2

[TVM Discuss] [Development] [RFC][DISCUSS] Non-Recursive AST Visiting

2020-04-06 Thread Zhao Wu via TVM Discuss


Yes. @yunjing_lh find that our quantized mobilenet v2 has the same issue in GCC 
5, QNN lowers too many ops and has deep recursion. One workaround is to 
increase stack size. However, non-recursive ast visiting should be one better 
way.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-discuss-non-recursive-ast-visiting/2481/4) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/284a1ff335a1dd6b78235cccd5cac186ba4d3fbe0a27dce3edafb1aa2c13eec3).


[TVM Discuss] [Development/RFC] [DISCUSS] Module based Model Runtime Interface

2020-04-12 Thread Zhao Wu via TVM Discuss


Thanks for respond.Finally, we don't use this special hack. We will generate 
this directly using LLVM IR. And LLVM will put this into `rodata` section 
correctly. 

Like this test:
![image|690x165](upload://nkm1SoLvI1b36CZyZHWAIRUd7bi.png) 

![image|690x397](upload://pXekhJ0Qe1ilLipMCtLW5JCP3DP.png)





---
[Visit 
Topic](https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025/53)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/4160e15af6d4efd9539f9af0abf808dce916907bb9890b9fca192cea36e97f80).


[TVM Discuss] [Development/RFC] [DISCUSS] Module based Model Runtime Interface

2020-04-12 Thread Zhao Wu via TVM Discuss


CUDA also could use this. Because cuda's target host is LLVM. As the example I 
show, it is in fact cuda target. So you could see `NVIDIA NNVM Compiler` in the 
constant string.





---
[Visit 
Topic](https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025/55)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/ce7efb7903b8451222030bb9f082a816040941c136c4230783936e00f428902c).


[TVM Discuss] [Development/RFC] [RFC] CoreML Runtime

2020-04-12 Thread Zhao Wu via TVM Discuss


I think leveraging Apple’s Neural Engine is one good motivation (we could add 
one example how to leverage this). As we have TFLite's runtime, I think add 
CoreML runtime is reasonable. 

[quote="kazum, post:1, topic:6309"]
Instead, we compile a CoreML model with the xcode `coremlc` command.
[/quote]

Does this have one version of XCore requires? My XCode is 10.1 but I can not 
find `coremlc` and only find `coremlcompiler` command.





---
[Visit Topic](https://discuss.tvm.ai/t/rfc-coreml-runtime/6309/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/28bd90d1b55f259ca0c7df411a8927ba45324d30cf64b54c6d83036676f22331).


[TVM Discuss] [Development/RFC] [RFC] CoreML Runtime

2020-04-13 Thread Zhao Wu via TVM Discuss


[quote="kazum, post:3, topic:6309"]
$(xcode-select -p)/usr/bin/coremlc
[/quote]

XCode 10.1 works. (Post must be 20 characters)





---
[Visit Topic](https://discuss.tvm.ai/t/rfc-coreml-runtime/6309/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/9084e723691f589c2394e4d83c9655fee171c406197ef725e5248f9559dc6efc).


[TVM Discuss] [Development/RFC] Deprecate OpenGL/WebGL in favor of Vulkan/WebGPU

2020-04-14 Thread Zhao Wu via TVM Discuss


I am told some times about the web acceleration in Mobile (The scenario is 
often the H5 App). For the compatibility of most mobile phones, they even could 
only use WebGL1 (not WebGL2). How do we consider this situation?





---
[Visit 
Topic](https://discuss.tvm.ai/t/deprecate-opengl-webgl-in-favor-of-vulkan-webgpu/6364/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/cfbc9dc82240e9a3dcfcb6ace0dc2d7f94de5766551fc4cfef79588118944829).


[TVM Discuss] [Development/RFC] [DISCUSS] Module based Model Runtime Interface

2020-04-15 Thread Zhao Wu via TVM Discuss


When we don’t have LLVM, we will fallback to our original way (call compiler to 
generate)





---
[Visit 
Topic](https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025/58)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/073c2c59f0f4fc849ff145be542dab6d6b35903e0527e8a1084ed6e364133229).


[TVM Discuss] [Development/RFC] [DISCUSS] Module based Model Runtime Interface

2020-04-15 Thread Zhao Wu via TVM Discuss


I think I should clarify your question. Do you mean we should generate .rodata 
section of `unsighed char __tvm_data_blob[]`?





---
[Visit 
Topic](https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025/60)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/c2048539a9a3cfcd92779df17b4d91373aa975a6dff0f720067a0ae634d5c00f).


[TVM Discuss] [Development/RFC] [DISCUSS] Module based Model Runtime Interface

2020-04-15 Thread Zhao Wu via TVM Discuss


ok make sense. if all agree, we could improve our fallback way to make tvm blob 
in the rodata section





---
[Visit 
Topic](https://discuss.tvm.ai/t/discuss-module-based-model-runtime-interface/5025/62)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/f6c278f2da806f11a655d1d4c19ea6e2e07b1187c9ab1eece37e77139fbd).


[TVM Discuss] [Development/RFC] Introduce new frontend for Caffe

2020-06-09 Thread Zhao Wu via TVM Discuss


Good Job! One quick question, many folks use caffe often define its own proto 
(though this is rare in other frameworks). Do we consider this situation?





---
[Visit Topic](https://discuss.tvm.ai/t/introduce-new-frontend-for-caffe/6918/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/0fbce41404aaf3b25a778f7596b0f49f76ff557d54c68c906e3a74354ca0f4b8).


[TVM Discuss] [Development/RFC] Introduce new frontend for Caffe

2020-06-09 Thread Zhao Wu via TVM Discuss


I think put `caffe_pb2.py` in the `frontend` folder is fine. We have already 
some other stuff like `mxnet_qnn_op_utils.py` / `tflite_flexbuffers.py` and so 
on.





---
[Visit Topic](https://discuss.tvm.ai/t/introduce-new-frontend-for-caffe/6918/4) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/f26adc0811feb4a22df95630576fb10fc319fbd40f02d8472e143f7ea1e758d6).


[TVM Discuss] [Development/RFC] [RFC][Relay][Topi] Hashtable support

2020-06-10 Thread Zhao Wu via TVM Discuss


@lfengad Could we follow up this? Because we have supported String and Array 
now.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-relay-topi-hashtable-support/5842/12) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/1be2a2f1616acf1723555ae2ffc57cf76c8f5575f96103673c40e8190e2f3de9).


[TVM Discuss] [Development] Add the document for TVMDSOOp

2020-06-11 Thread Zhao Wu via TVM Discuss


I agree with @zhiics. Official tutorial is important. Besides @zhiics's 
content, we could also list one example how to integrate it with one tensorflow 
model end to end, not just the low level  `tvm.build`.  This will be the common 
situation the users want to use.





---
[Visit Topic](https://discuss.tvm.ai/t/add-the-document-for-tvmdsoop/6622/5) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/cfe9955a7b66abf1d6854ae06cca35675de1411e5f1d38362e560327c2d7c98f).


[TVM Discuss] [Development/RFC] [RFC] Ansor: An Auto-scheduler for TVM (AutoTVM v2.0)

2020-06-17 Thread Zhao Wu via TVM Discuss


We do support to generate OpenCL, so we could run on Mali GPU. However, we 
don't test it on Mali GPU when we complete Ansor. Some difference compared with 
Nvidia GPU we could see, for example, on Mali GPU, we shouldn't use 
`cache_read("shared")` because Mali GPU doesn't have separate shared memory 
like Nvidia GPU. And we should generate `vectorize` explicitly which doesn't be 
required by Nvidia GPU.

We have collected the performance data of TFLite quantized model on ARM CPU. 
However we don't put it on paper. I am glad to share it:

![image|360x217](upload://kOVtkrTGnilHXZF4aCFqSDGr3xR.png) 

The target is 4 cores of cortext-a53, qnnpack commit is 
(b7bacb1899e6fa3a934c1dd6128096f2e1abf071) and only convolution been counted. 
 As you could see we have competitive performance compared with TFLite (2.1) 
and libraries like Qnnpack. However we should still have room to improve, for 
example we should generate the pari instruction (`smlal` / `smlal2`), which 
maybe could be done by tensorize.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-ansor-an-auto-scheduler-for-tvm-autotvm-v2-0/7005/10)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/b8265a98075df24bff1c38c633f5dae7ee516403e8b3993c1113a1ff588673d8).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-17 Thread Zhao Wu via Apache TVM Discuss


The performance can not beyond dense would have many reasons, but I think 
strassen algorithm is not one key part. @jcf94 has done some experiment on this.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/6) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/787522a6de53ef6e25c094f05cc2c8e20fffd483b3fe645dccebbbaf7d6ff4ba).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-17 Thread Zhao Wu via Apache TVM Discuss


@jcf94 has explained very well for strassen algorithm. The link you posted is I 
wrote. However, we should notice that my post is not to show the best 
performance TVM could achieve, just show how easy TVM could a reasonable 
performance (beyond numpy). 

If we still want to improve performance, we still could dig it. For example, 
adding `cache_write` for the matmul output stage / add `auto_unroll` 
configuration and so on. However, I think this is should be completed by our 
AutoTVM v2.0 (Auto Scheduler). You could try our auto scheduler. Simple matmul 
using topi should be upstreamed completely, right? cc @jcf94





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/9) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/1b89bdfb3f2ca6da12489b46e4be039c5a135c5dd45ad6ba9b8e08397a531f07).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-18 Thread Zhao Wu via Apache TVM Discuss


I don't think u should set `TVM_NUM_THREADS` on arm because of arm's BIG LITTLE 
architecture. I think you should call `runtime.config_thread_pool` to complete 
the core binding work. Another thing is we shouldn't make tvm worker thread run 
different frequency cpus (aka, one worker thread is in the BIG cpu, one worker 
thread is in the LITTLE cpu), this will bring worse performance.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/12) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/6087b155e7bad925912521b9af71acb320512ecc0b166de2c24c8f7cb94b4295).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-18 Thread Zhao Wu via Apache TVM Discuss


On your case, current code is will call 4 cores (id 0 ~ 3). So parallel brings 
you better performance.

About time consuming functions, Do you use auto tvm? If you use auto tvm, the 
default cpu TVM uses is big core (that is index 7). If you decide to use 4 
little cores, you should make auto tvm use these 4 little cores too. One 
elegant way is we should have `thread_mod` to make users set (see link: 
https://discuss.tvm.apache.org/t/autotvm-rpcrunner-and-tvm-num-threads/3534/11?u=frozengene).
 Current workaround could be done we disable core 4, 5, 6, 7 on devices 
temporally. (We indeed to provide one interface for users how to control big / 
little cores when to tune).





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/14) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/53019625f9b79b15664205bdfa2d91ff63a162849ce2b6f87e1d79c69b5df1e0).


[Apache TVM Discuss] [Development] Strassen Algorithm for Dense

2020-09-21 Thread Zhao Wu via Apache TVM Discuss


If you want to measure it more robust, you should run it more times and 
calculate its average time. For example you could run 1000 times.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/strassen-algorithm-for-dense/2661/16) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/aa49226226d478314435f8cd251a0a56a38cf2fca52d07617e14465930421a46).


[Apache TVM Discuss] [Development/RFC] RFC] Optionally include object file generation in tvmc

2020-10-09 Thread Zhao Wu via Apache TVM Discuss


My code review is what TQ said. When we call `export_library`, we could save 
`a.tar` or `a.so`. If we save `a.tar`, which contains the object file (like 
a.o), this is different with `tvmc`'s `tar` collections.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-optionally-include-object-file-generation-in-tvmc/8120/11)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/44a10989f3416ef0b693fd6e127a37b76aefc54651a61aa6934b1dc2fbf17d98).


[Apache TVM Discuss] [Development/RFC] Add some new tensorflow ops

2020-10-22 Thread Zhao Wu via Apache TVM Discuss


I think we could just send pr directly. Of course, we could make them be 
several prs, not one big pr.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/add-some-new-tensorflow-ops/8217/3) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/bf37b10f803b1526ba815551fe0d4fea549900fdf4cdcd6dc998e5ed82e0281a).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-26 Thread Zhao Wu via Apache TVM Discuss


[quote="giuseros, post:1, topic:8253"]
`qnn_conv2d_legalize.register`
[/quote]

does code in `alter_op_layout` work?

```
best_plevel_impl, outs = relay.backend.compile_engine.select_implementation(
relay.op.get("nn.conv2d"), attrs, tinfos, out_type, target)
   if best_plevel_impl.name == "conv2d_int16":
```





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/3)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/38a1a3e285f739f1176ea1417fa9069347a776a14141c8190be4fee28c654f92).


[Apache TVM Discuss] [Development/RFC] [DISCUSS] TVM v0.8 Roadmap

2020-10-26 Thread Zhao Wu via Apache TVM Discuss


Looking forward it. TVM auto scheduler is also doing some experiment on this. I 
believe spare network has a good future too.





---
[Visit Topic](https://discuss.tvm.apache.org/t/discuss-tvm-v0-8-roadmap/8139/8) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/dfaf35320908af0342def23adb47c0a8ba300469f4ff2284635f9d250c718d78).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-26 Thread Zhao Wu via Apache TVM Discuss


@giuseros  I doesn't run it, but according to my understanding, these two 
functions's inputs should be the same type (tvm.relay.expr). For example, 
inside the alter_op_layout function we have logic:

```
# HWIO -> OIHW
kernel_transform = relay.transpose(inputs[1], axes=[3, 2, 0, 1])
# alpha, alpha, CO, CI
weight = relay.nn.contrib_conv2d_winograd_weight_transform(kernel_transform,
tile_size=tile_size)
```

relay.transpose requires its input's type is tvm.relay.expr.

For the doc of `conv2d_alter_layout`, it says we require tvm.relay.expr too:
```
@tvm.target.generic_func
def conv2d_alter_layout(attrs, inputs, tinfos, out_type):
"""Change Conv2D layout.

Parameters
--
attrs : tvm.ir.Attrs
Attributes of current convolution
inputs : tvm.relay.Expr
Grouped input symbols
```





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/ddfb664ef724a61ef67e877b792a612e48a0096dc3c4abe67c5e7d5257ee5709).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-28 Thread Zhao Wu via Apache TVM Discuss


@giuseros @anijain2305 Let us accept one more argument like `alter_op_layout`  

```
@tvm.target.generic_func
def conv2d_alter_layout(attrs, inputs, tinfos, out_type):

@tvm.target.generic_func
def qnn_conv2d_legalize(attrs, inputs, types):
"""Default legalization is None."""
return None
```
Then we could leverage `relay.backend.compile_engine.select_implementation`





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/8)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/dd316de373aecf2859b493c070c365411dab1b41987380112e6995b01c52da90).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-30 Thread Zhao Wu via Apache TVM Discuss


For alter_op_layout, we will alter the weight layout, normally we will change 
the weight layout to 5D, the last dim is queried from our AutoTVM log file. For 
example:
```
if topi_tmpl == "conv2d_nchw_spatial_pack.arm_cpu":
assert data_layout == "NCHW" and kernel_layout == "OIHW"
N, CI, H, W = get_const_tuple(data.shape)
CO, _, KH, KW = get_const_tuple(kernel.shape)
VC = cfg['tile_co'].size[-1]
```
If there is no workload, we don't want to change the layout. However, you could 
argue we could set one fixed value like 8, but if you do this, you need to 
change the compute logic of conv2d too (like `def conv2d_spatial_pack_nchw`). 
At there, we will say the `VC` is `cfg['tile_co`]`, not 8.

[quote="giuseros, post:9, topic:8253"]
What should we do in `legalize` ? Simply return back a default legalization?
[/quote]

Default legalization will make sense.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/10)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/bd624e5d2d9977f486b202cd335e42ce7929d32c8f40fd8c246f00820a96b6e8).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-30 Thread Zhao Wu via Apache TVM Discuss


[quote="giuseros, post:11, topic:8253"]
What I am missing is why we don’t want to change the layout when 
`cfg.is_fallback` . In that case, the strategy is defined
[/quote]

When we enter into fall back configuration means we don't find the 
configuration of this workload in the tuning log. So like I replied before, 
even I know this is `conv2d_nchw_spatial_pack.arm_cpu`, but I can not get 
`cfg['tile_co'].size[-1]`.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/12)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/387f225747e037134536b14ac7a5e17aca025efcd4feab9059c36022b36fd54f).


[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-30 Thread Zhao Wu via Apache TVM Discuss


Ah...u are right, @giuseros sorry i mislead u. I remembered wrong before. We 
will have one default value, it is 1 if i remember correctly. But even we could 
have one value, the value is not trusted, because we haven’t tuned it. We maybe 
could say we could fix it for 4 or 8, but I think it doesn’t bring much 
benefit, because when we enter into fallback, the performance we will not 
guarantee, if u really want to do it, you could set 4 or 8 like I said when to 
enter into fallback, but it doesn’t mean much.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantized-models-and-legalization-pass/8253/14)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/e42f8b3b9af83a133c753589b9b702051256515272a7b3ca89f41425eabe44cf).


[Apache TVM Discuss] [Development/RFC] Expand Span for imported module

2020-11-11 Thread Zhao Wu via Apache TVM Discuss


I would like to add one `flags` attribute to make us have more extension for 
the future. Like we could have `Span: (sourcename: ..., line:... column:... 
flags: SPFlagSourceNameImportedFromModel, ...)` Then we could query the flags 
attribute to handle the specific condition.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/expand-span-for-imported-module/8435/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/e5d92fa5a65c2c6a88f47203150eef4286ea352856277ca1ba3794a07fc74f84).


[Apache TVM Discuss] [Development] Dynamic Model Support

2020-11-20 Thread Zhao Wu via Apache TVM Discuss


I enjoy the reading of https://arxiv.org/abs/2006.03031 which supports dyn 
model support in the TVM using relay vm. 

However, i want to ask some quick questions:
1. Do we have uploaded completely of Nimble code on the mainstream? Especially 
about the memory performance issue like this : 
https://discuss.tvm.apache.org/t/vm-the-performance-degradation-of-vm-runtime-and-dynamic-shape-support-compared-to-graph-runtime/6076/2?u=frozengene.
 In paper we have described we have memory planning and heterogeneous device 
placement to solve this.

2. The evaluated platforms contain arm, however it is on cloud. How about 
embedded platforms? Does VM could be one part of `libtvm_runtime.so` which 
could be cross compiled using cross compiler?

Thanks for the great work! I like this design solving this problem. 
@haichen @jroesch @zhiics





---
[Visit Topic](https://discuss.tvm.apache.org/t/dynamic-model-support/8491/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/933c3593e20e94106a96fc2ebdf321aa2c84f4e34b0ea5cefb69f25013547aba).


[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

2020-11-21 Thread Zhao Wu via Apache TVM Discuss


One question for the performance regression, how to judge the normal 
fluctuation, especially  CPU? Like resnet50 maybe 20.00ms, but becomes 20.88ms 
after one pr?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/7)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/89e901316f8e75b07e8a01c296b56e445eabe77c8e950d3bfce00fac7b762c47).


  1   2   >