On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > The main issue is complex _Float16 functions in libgcc. If _Float16 doesn't
> > require -mavx512fp16, we need to compile complex _Float16 functions in
> > libgcc without -mavx512fp16. Complex _Float16 performance is very
> > important for our _Float16 usage. _Float16 performance has to be
> > very fast. There should be no emulation anywhere when -mavx512fp16
> > is used. That is why _Float16 is available only with -mavx512fp16.
>
> It should be possible to emulate scalar _Float16 using _Float32 with a
> reasonable
> performance trade-off. I think users caring for _Float16 performance will
> use vector intrinsics anyway since for scalar code _Float32 code will likely
> perform the same (at double storage cost)
Only if it is allowed to have excess precision for _Float16. If not, then
one would need to (expensively?) round after every operation at least.
Jakub