Sorry for the indentation issue, and thanks for your reminder. On Wed, May 8, 2019 at 3:39 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, May 8, 2019 at 5:06 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Wed, May 8, 2019 at 2:33 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > On Tue, May 7, 2019 at 8:49 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > > > > > > This patch is about to enable support for bfloat16 > > > > > > > > > > > > which will be in Future Cooper Lake, Please refer to > > > > > > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference > > > > > > > > > > > > for more details about BF16. > > > > > > > > > > > > > > > > > > > > > > > > There are 3 instructions for AVX512BF16: > > > > > > > > > > > > VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS > > > > > > > > > > > > instructions, which are Vector Neural Network > > > > > > > > > > > > Instructions supporting: > > > > > > > > > > > > > > > > > > > > > > > > - VCVTNE2PS2BF16: Convert Two Packed Single Data > > > > > > > > > > > > to One Packed BF16 Data. > > > > > > > > > > > > - VCVTNEPS2BF16: Convert Packed Single Data to > > > > > > > > > > > > Packed BF16 Data. > > > > > > > > > > > > - VDPBF16PS: Dot Product of BF16 Pairs > > > > > > > > > > > > Accumulated into Packed Single Precision. > > > > > > > > > > > > > > > > > > > > > > > > Since only BF16 intrinsics are supported, we treat it > > > > > > > > > > > > as HI for simplicity. > > > > > > > > > > > > > > > > > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps > > > > > > > > > > > using HImode > > > > > > > > > > > instead of HFmode. Is there a compelling reason not to > > > > > > > > > > > introduce > > > > > > > > > > > corresponding bf16_format supporting infrastructure and > > > > > > > > > > > declare these > > > > > > > > > > > intrinsics using half-binary (HBmode ?) mode instead? > > > > > > > > > > > > > > > > > > > > > > Uros. > > > > > > > > > > > > > > > > > > > > Bfloat16 isn't IEEE standard which we want to reserve > > > > > > > > > > HFmode for. > > > > > > > > > > > > > > > > > > True. > > > > > > > > > > > > > > > > > > > The IEEE 754 standard specifies a binary16 as having the > > > > > > > > > > following format: > > > > > > > > > > Sign bit: 1 bit > > > > > > > > > > Exponent width: 5 bits > > > > > > > > > > Significand precision: 11 bits (10 explicitly stored) > > > > > > > > > > > > > > > > > > > > Bfloat16 has the following format: > > > > > > > > > > Sign bit: 1 bit > > > > > > > > > > Exponent width: 8 bits > > > > > > > > > > Significand precision: 8 bits (7 explicitly stored), as > > > > > > > > > > opposed to 24 > > > > > > > > > > bits in a classical single-precision floating-point format > > > > > > > > > > > > > > > > > > This is why I proposed to introduce HBmode (and corresponding > > > > > > > > > bfloat16_format) to distingush between ieee HFmode and BFmode. > > > > > > > > > > > > > > > > > > > > > > > > > Unless there is BF16 language level support, HBmode has no > > > > > > > > advantage > > > > > > > > over HImode. We can add HBmode when we gain BF16 language > > > > > > > > support. > > > > > > > > > > > > > > > > -- > > > > > > > > H.J. > > > > > > > > > > > > > > Any other comments, I'll merge this to trunk? > > > > > > > > > > > > It is not a regression, so please no. > > > > > > > > > > Ehm, "regression fix" ... > > > > > > > > > > Uros. > > > > > > > > Update patch. > > > > > > Index: gcc/config/i386/i386-builtins.c > > > =================================================================== > > > --- gcc/config/i386/i386-builtins.c (revision 270934) > > > +++ gcc/config/i386/i386-builtins.c (working copy) > > > @@ -1920,6 +1920,7 @@ > > > F_VPCLMULQDQ, > > > F_AVX512VNNI, > > > F_AVX512BITALG, > > > + F_AVX512BF16, > > > F_MAX > > > }; > > > > > > @@ -2064,7 +2065,8 @@ > > > {"gfni", F_GFNI, P_ZERO}, > > > {"vpclmulqdq", F_VPCLMULQDQ, P_ZERO}, > > > {"avx512vnni", F_AVX512VNNI, P_ZERO}, > > > - {"avx512bitalg", F_AVX512BITALG, P_ZERO} > > > + {"avx512bitalg", F_AVX512BITALG, P_ZERO}, > > > + {"avx512bf16", F_AVX512BF16, P_ZERO} > > > }; > > > > > > /* This parses the attribute arguments to target in DECL and determines > > > > > > You also need to update cpuinfo.h and cpuinfo.c in libgcc/config/i386 > > > with avx512bf16, plus relevant test files. > > > > > > Index: gcc/testsuite/gcc.target/i386/avx-1.c > > > Index: gcc/testsuite/gcc.target/i386/avx-2.c > > > > > > No need to update above two files, sse-*.c changes are enough to cover > > > new functionality. > > > > > > Otherwise LGTM, but please repost updated patch with the ChangeLog > > > entry (please see [1]). > > > > > > [1] https://www.gnu.org/software/gcc/contribute.html#patches > > > > > > Uros. > > > > Update patch: > > 1. Add Changelog. > > In fututre, please add ChangeLog entries as plaintext to the message, > as instructed in [1]. Also, please read the links to GCC coding > conventions and GNU Coding Standards for further information. > > [1] https://www.gnu.org/software/gcc/contribute.html#patches > > > 2. Update libgcc part. > > Please add one line of space after > > +2019-05-07 Wei Xiao <wei3.x...@intel.com> > + > + * > > in all ChangeLog entries (please follow the form of existing entries > as an example). > > + * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test. > + * gcc.target/i386/sse-12.c: Add -mavx512bf16. > + * gcc.target/i386/sse-13.c: Ditto. > + * gcc.target/i386/sse-14.c: Ditto. > + * gcc.target/i386/sse-22.c: Ditto. > + * gcc.target/i386/sse-23.c: Ditto. > + * g++.dg/other/i386-2.C: Ditto. > + * g++.dg/other/i386-3.C: Add avx512bf16. > + > +2019-05-07 Hongtao Liu <hongtao....@intel.com> > + * gcc.target/i386/builtin_target.c: Ditto. > > You should group entries a bit and write: > > * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test. > * gcc.target/i386/builtin_target.c: Handle avx512bf16. > * gcc.target/i386/sse-12.c: Add -mavx512bf16. > * gcc.target/i386/sse-13.c: Ditto. > * gcc.target/i386/sse-14.c: Ditto. > * gcc.target/i386/sse-22.c: Ditto. > * gcc.target/i386/sse-23.c: Ditto. > * g++.dg/other/i386-2.C: Ditto. > * g++.dg/other/i386-3.C: Ditto. > > @@ -265,6 +265,10 @@ > assert (__builtin_cpu_supports ("avx5124vnniw")); > if (edx & bit_AVX5124FMAPS) > assert (__builtin_cpu_supports ("avx5124fmaps")); > + > + __cpuid_count (7, 1, eax, ebx, ecx, edx); > + if (eax & bit_AVX512BF16) > + assert (__builtin_cpu_supports ("avx512bf16")); > } > > Please fix indentation here (too many leading spaces). > > OK for mainline with the above fixes. > > Thanks, > Uros.
-- BR, Hongtao