Re: Enable BF16 support (Please ignore my former email)

Hongtao Liu Wed, 08 May 2019 02:15:55 -0700

Sorry for the indentation issue, and thanks for your reminder.

On Wed, May 8, 2019 at 3:39 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, May 8, 2019 at 5:06 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Wed, May 8, 2019 at 2:33 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Tue, May 7, 2019 at 8:49 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > > > > > > > > > >     This patch is about to enable support for bfloat16 
> > > > > > > > > > > > which will be in Future Cooper Lake, Please refer to 
> > > > > > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
> > > > > > > > > > > > for more details about BF16.
> > > > > > > > > > > >
> > > > > > > > > > > > There are 3 instructions for AVX512BF16: 
> > > > > > > > > > > > VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS 
> > > > > > > > > > > > instructions, which are Vector Neural Network 
> > > > > > > > > > > > Instructions supporting:
> > > > > > > > > > > >
> > > > > > > > > > > > -       VCVTNE2PS2BF16: Convert Two Packed Single Data 
> > > > > > > > > > > > to One Packed BF16 Data.
> > > > > > > > > > > > -       VCVTNEPS2BF16: Convert Packed Single Data to 
> > > > > > > > > > > > Packed BF16 Data.
> > > > > > > > > > > > -       VDPBF16PS: Dot Product of BF16 Pairs 
> > > > > > > > > > > > Accumulated into Packed Single Precision.
> > > > > > > > > > > >
> > > > > > > > > > > > Since only BF16 intrinsics are supported, we treat it 
> > > > > > > > > > > > as HI for simplicity.
> > > > > > > > > > >
> > > > > > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps 
> > > > > > > > > > > using HImode
> > > > > > > > > > > instead of HFmode. Is there a compelling reason not to 
> > > > > > > > > > > introduce
> > > > > > > > > > > corresponding bf16_format supporting infrastructure and 
> > > > > > > > > > > declare these
> > > > > > > > > > > intrinsics using half-binary (HBmode ?) mode instead?
> > > > > > > > > > >
> > > > > > > > > > > Uros.
> > > > > > > > > >
> > > > > > > > > > Bfloat16 isn't IEEE standard which we want to reserve 
> > > > > > > > > > HFmode for.
> > > > > > > > >
> > > > > > > > > True.
> > > > > > > > >
> > > > > > > > > > The IEEE 754 standard specifies a binary16 as having the 
> > > > > > > > > > following format:
> > > > > > > > > > Sign bit: 1 bit
> > > > > > > > > > Exponent width: 5 bits
> > > > > > > > > > Significand precision: 11 bits (10 explicitly stored)
> > > > > > > > > >
> > > > > > > > > > Bfloat16 has the following format:
> > > > > > > > > > Sign bit: 1 bit
> > > > > > > > > > Exponent width: 8 bits
> > > > > > > > > > Significand precision: 8 bits (7 explicitly stored), as 
> > > > > > > > > > opposed to 24
> > > > > > > > > > bits in a classical single-precision floating-point format
> > > > > > > > >
> > > > > > > > > This is why I proposed to introduce HBmode (and corresponding
> > > > > > > > > bfloat16_format) to distingush between ieee HFmode and BFmode.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Unless there is BF16 language level support,  HBmode has no 
> > > > > > > > advantage
> > > > > > > > over HImode.   We can add HBmode when we gain BF16 language 
> > > > > > > > support.
> > > > > > > >
> > > > > > > > --
> > > > > > > > H.J.
> > > > > > >
> > > > > > > Any other comments, I'll merge this to trunk?
> > > > > >
> > > > > > It is not a regression, so please no.
> > > > >
> > > > > Ehm, "regression fix" ...
> > > > >
> > > > > Uros.
> > > >
> > > > Update patch.
> > >
> > > Index: gcc/config/i386/i386-builtins.c
> > > ===================================================================
> > > --- gcc/config/i386/i386-builtins.c    (revision 270934)
> > > +++ gcc/config/i386/i386-builtins.c    (working copy)
> > > @@ -1920,6 +1920,7 @@
> > >    F_VPCLMULQDQ,
> > >    F_AVX512VNNI,
> > >    F_AVX512BITALG,
> > > +  F_AVX512BF16,
> > >    F_MAX
> > >  };
> > >
> > > @@ -2064,7 +2065,8 @@
> > >    {"gfni",    F_GFNI,    P_ZERO},
> > >    {"vpclmulqdq", F_VPCLMULQDQ, P_ZERO},
> > >    {"avx512vnni", F_AVX512VNNI, P_ZERO},
> > > -  {"avx512bitalg", F_AVX512BITALG, P_ZERO}
> > > +  {"avx512bitalg", F_AVX512BITALG, P_ZERO},
> > > +  {"avx512bf16", F_AVX512BF16, P_ZERO}
> > >  };
> > >
> > >  /* This parses the attribute arguments to target in DECL and determines
> > >
> > > You also need to update cpuinfo.h and cpuinfo.c in libgcc/config/i386
> > > with avx512bf16, plus relevant test files.
> > >
> > > Index: gcc/testsuite/gcc.target/i386/avx-1.c
> > > Index: gcc/testsuite/gcc.target/i386/avx-2.c
> > >
> > > No need to update above two files, sse-*.c changes are enough to cover
> > > new functionality.
> > >
> > > Otherwise LGTM, but please repost updated patch with the ChangeLog
> > > entry (please see [1]).
> > >
> > > [1] https://www.gnu.org/software/gcc/contribute.html#patches
> > >
> > > Uros.
> >
> > Update patch:
> > 1. Add Changelog.
>
> In fututre, please add ChangeLog entries as plaintext to the message,
> as instructed in [1]. Also, please read the links to GCC coding
> conventions and GNU Coding Standards for further information.
>
> [1] https://www.gnu.org/software/gcc/contribute.html#patches
>
> > 2. Update libgcc part.
>
> Please add one line of space after
>
> +2019-05-07  Wei Xiao  <wei3.x...@intel.com>
> +
> +    *
>
> in all ChangeLog entries (please follow the form of existing entries
> as an example).
>
> +    * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test.
> +    * gcc.target/i386/sse-12.c: Add -mavx512bf16.
> +    * gcc.target/i386/sse-13.c: Ditto.
> +    * gcc.target/i386/sse-14.c: Ditto.
> +    * gcc.target/i386/sse-22.c: Ditto.
> +    * gcc.target/i386/sse-23.c: Ditto.
> +    * g++.dg/other/i386-2.C: Ditto.
> +    * g++.dg/other/i386-3.C: Add avx512bf16.
> +
> +2019-05-07 Hongtao Liu <hongtao....@intel.com>
> +    * gcc.target/i386/builtin_target.c: Ditto.
>
> You should group entries a bit and write:
>
>     * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test.
>     * gcc.target/i386/builtin_target.c: Handle avx512bf16.
>     * gcc.target/i386/sse-12.c: Add -mavx512bf16.
>     * gcc.target/i386/sse-13.c: Ditto.
>     * gcc.target/i386/sse-14.c: Ditto.
>     * gcc.target/i386/sse-22.c: Ditto.
>     * gcc.target/i386/sse-23.c: Ditto.
>     * g++.dg/other/i386-2.C: Ditto.
>     * g++.dg/other/i386-3.C: Ditto.
>
> @@ -265,6 +265,10 @@
>      assert (__builtin_cpu_supports ("avx5124vnniw"));
>        if (edx & bit_AVX5124FMAPS)
>      assert (__builtin_cpu_supports ("avx5124fmaps"));
> +
> +      __cpuid_count (7, 1, eax, ebx, ecx, edx);
> +      if (eax & bit_AVX512BF16)
> +          assert (__builtin_cpu_supports ("avx512bf16"));
>      }
>
> Please fix indentation here (too many leading spaces).
>
> OK for mainline with the above fixes.
>
> Thanks,
> Uros.




-- 
BR,
Hongtao

Re: Enable BF16 support (Please ignore my former email)

Reply via email to