Re: [PATCH 0/2] Initial support for AVX512FP16

Hongtao Liu via Gcc-patches Tue, 06 Jul 2021 01:46:48 -0700

On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches
<[email protected]> wrote:
>
> On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > > The main issue is complex _Float16 functions in libgcc.  If _Float16 
> > > doesn't
> > > require -mavx512fp16, we need to compile complex _Float16 functions in
> > > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > > important for our _Float16 usage.   _Float16 performance has to be
> > > very fast.  There should be no emulation anywhere when -mavx512fp16
> > > is used.   That is why _Float16 is available only with -mavx512fp16.
> >
> > It should be possible to emulate scalar _Float16 using _Float32 with a
> > reasonable
> > performance trade-off.  I think users caring for _Float16 performance will
> > use vector intrinsics anyway since for scalar code _Float32 code will likely
> > perform the same (at double storage cost)
>
> Only if it is allowed to have excess precision for _Float16.  If not, then
> one would need to (expensively?) round after every operation at least.
There may be inconsistent behavior between soft-fp and avx512fp16
instructions if we emulate _Float16 w/ float .
 i.e
  1) for a + b - c where b and c are variables with the same big value
and a + b is NAN at _Float16 and real value at float, avx512fp16
instruction will raise an exception but soft-fp won't(unless it's
rounded after every operation.)
  2) a / b where b is denormal value and AVX512FP16 won't flush it to
zero even w/ -Ofast, but when it's extended to float and using divss,
it will be flushed to zero and raise an exception when compiling w/
Ofast


To solve the upper issue, i try to add full emulation for _Float16(for
all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc),
problem is in pass_expand, it always try wider mode first instead of
using soft-fp

  /* Look for a wider mode of the same class for which we think we
     can open-code the operation.  Check for a widening multiply at the
     wider mode as well.  */

  if (CLASS_HAS_WIDER_MODES_P (mclass)
      && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
    FOR_EACH_WIDER_MODE (wider_mode, mode)

I think pass_expand did this for some reason, so I'm a little afraid
to touch this part of the code.

So the key point is that the soft-fp and avx512fp16 instructions may
do not behave the same on the exception, is this acceptable?

BTW, i've finished a initial patch to enable _Float16 on sse2, and
emulate _Float16 operation w/ float, and it passes all  312 new tests
which are related to _Float16, but those units tests doesn't cover the
scenario I'm talking about.
>
>         Jakub
>


-- 
BR,
Hongtao

Re: [PATCH 0/2] Initial support for AVX512FP16

Reply via email to