On Tue, May 7, 2019 at 8:49 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > > > > > > This patch is about to enable support for bfloat16 which > > > > > > > > > will be in Future Cooper Lake, Please refer to > > > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference > > > > > > > > > for more details about BF16. > > > > > > > > > > > > > > > > > > There are 3 instructions for AVX512BF16: VCVTNE2PS2BF16, > > > > > > > > > VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector > > > > > > > > > Neural Network Instructions supporting: > > > > > > > > > > > > > > > > > > - VCVTNE2PS2BF16: Convert Two Packed Single Data to One > > > > > > > > > Packed BF16 Data. > > > > > > > > > - VCVTNEPS2BF16: Convert Packed Single Data to Packed > > > > > > > > > BF16 Data. > > > > > > > > > - VDPBF16PS: Dot Product of BF16 Pairs Accumulated into > > > > > > > > > Packed Single Precision. > > > > > > > > > > > > > > > > > > Since only BF16 intrinsics are supported, we treat it as HI > > > > > > > > > for simplicity. > > > > > > > > > > > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps using > > > > > > > > HImode > > > > > > > > instead of HFmode. Is there a compelling reason not to introduce > > > > > > > > corresponding bf16_format supporting infrastructure and declare > > > > > > > > these > > > > > > > > intrinsics using half-binary (HBmode ?) mode instead? > > > > > > > > > > > > > > > > Uros. > > > > > > > > > > > > > > Bfloat16 isn't IEEE standard which we want to reserve HFmode for. > > > > > > > > > > > > True. > > > > > > > > > > > > > The IEEE 754 standard specifies a binary16 as having the > > > > > > > following format: > > > > > > > Sign bit: 1 bit > > > > > > > Exponent width: 5 bits > > > > > > > Significand precision: 11 bits (10 explicitly stored) > > > > > > > > > > > > > > Bfloat16 has the following format: > > > > > > > Sign bit: 1 bit > > > > > > > Exponent width: 8 bits > > > > > > > Significand precision: 8 bits (7 explicitly stored), as opposed > > > > > > > to 24 > > > > > > > bits in a classical single-precision floating-point format > > > > > > > > > > > > This is why I proposed to introduce HBmode (and corresponding > > > > > > bfloat16_format) to distingush between ieee HFmode and BFmode. > > > > > > > > > > > > > > > > Unless there is BF16 language level support, HBmode has no advantage > > > > > over HImode. We can add HBmode when we gain BF16 language support. > > > > > > > > > > -- > > > > > H.J. > > > > > > > > Any other comments, I'll merge this to trunk? > > > > > > It is not a regression, so please no. > > > > Ehm, "regression fix" ... > > > > Uros. > > Update patch. Index: gcc/config/i386/i386-builtins.c =================================================================== --- gcc/config/i386/i386-builtins.c (revision 270934) +++ gcc/config/i386/i386-builtins.c (working copy) @@ -1920,6 +1920,7 @@ F_VPCLMULQDQ, F_AVX512VNNI, F_AVX512BITALG, + F_AVX512BF16, F_MAX }; @@ -2064,7 +2065,8 @@ {"gfni", F_GFNI, P_ZERO}, {"vpclmulqdq", F_VPCLMULQDQ, P_ZERO}, {"avx512vnni", F_AVX512VNNI, P_ZERO}, - {"avx512bitalg", F_AVX512BITALG, P_ZERO} + {"avx512bitalg", F_AVX512BITALG, P_ZERO}, + {"avx512bf16", F_AVX512BF16, P_ZERO} }; /* This parses the attribute arguments to target in DECL and determines You also need to update cpuinfo.h and cpuinfo.c in libgcc/config/i386 with avx512bf16, plus relevant test files. Index: gcc/testsuite/gcc.target/i386/avx-1.c Index: gcc/testsuite/gcc.target/i386/avx-2.c No need to update above two files, sse-*.c changes are enough to cover new functionality. Otherwise LGTM, but please repost updated patch with the ChangeLog entry (please see [1]). [1] https://www.gnu.org/software/gcc/contribute.html#patches Uros.