Thanks for this much needed contribution!
Can you elaborate on the design you imagine for
> there needs to be some control over the output data types for converted
> operations. Some FP16 operations might accumulate results into FP32 for
> numerical reasons and some might produce an FP16 number. In the rewrite of
> the graph, we will provide some control over this variable.
In some edge use cases, it is desirable for all parameters to be stored as fp16
to limit storage footprint. In that context, take the following graph as an
example,
```
conv2d -> multiply -> bias_add -> relu -> max_pool
Greenlist{conv2d, max_pool}
Graylist{elemwise}
```
If the conv2d should accumulate in fp32, but the consecutive elemwise operators
should run in fp16, how will a user express this? In this case I would expect
final graph to be,
[fp16] -> conv2d -> [fp32] -> cast(fp16) -> multiply -> bias_add -> relu ->
max_pool
fused_conv2d_cast_multiply_bias_add_relu -> max_pool
An alternative option would be to add an accumulator_dtype separate from
output_dtype to various operators and rewrite based on that field. Both can
work but I'd like to hear more on how you envision to do this with the mixed
precision transform in the above context. Thanks!
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-relay-fp32-fp16-model-support/9994/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/bf0ab97d2429e1182604109cd7a15fca79fdf8d920e8b83eaed0593162595086).