> @FrozenGene For the output_min and max, isn't the out_dtype enough? If its
> uint8, we can clamp at 0 and 255. If its int8, we can clamp at -128 and 127.
> I don't see any reason the values will be any different, unless you want to
> fuse the quantized relu in the quantized convolution from th
> > It appears to me this would let them simulate smaller than 8 bit
> > quantizations.
>
> If _simulating 8 smaller bit_ is the case, 8 bit should be able to hold
> activation min/max value.
8 bits could hold. But what the value output_min / output_max is ? I think
@jnorwood want to express t
> @FrozenGene a clarifying question to your above comment. If we pass in the
> output scale and shift can we not compute int32-> int8 by simply adding more
> nodes in the graph.
doesn't understand your comment fully. do you mean could we avoid int32 -> int8
computation? If so, I think we can no
> It appears to me this would let them simulate smaller than 8 bit quantizations
If *simulating 8 smaller bit* is the case, 8 bit should be able to hold
activation min/max value.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHu
The tflite quantized convolution reference implementation passes in both limits
as int32 values. It appears to me this would let them simulate smaller than 8
bit quantizations, if that is something you want to support.
this is from ` tensorflow/lite/kernels/internal/reference/conv.h `
`
acc
@FrozenGene For the output_min and max, isn't the out_dtype enough? If its
uint8, we can clamp at 0 and 255. If its int8, we can clamp at -128 and 127. I
don't see any reason the values will be any different, unless you want to fuse
the quantized relu in the quantized convolution from the starti
@FrozenGene a clarifying question to your above comment. If we pass in the
output scale and shift can we not compute int32-> int8 by simply adding more
nodes in the graph.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
htt
@anijain2305 I understand your thought and thought. I agree we should make the
api minimal. However, no matter what way, q_conv2d’s int32 output should be
clamped into uint8 range. If you don’t pass min / max, you also need do `output
= std::max(output, 0)` and `output = std::min(output, 255)` t
@tqchen One thing I wanted to clarify: why isn't the Analyzer class integrated
into the Node hierarchy? Instead some separate closure-based mechanism is used
for python integration, which feels strange and seemingly makes it harder to
create functions which accept Analyzer objects and work acros