On Tue, Nov 10, 2020 at 9:08 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Mon, Nov 9, 2020 at 8:26 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Mon, Nov 9, 2020 at 11:31 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > > + /* Support unified builtin. */ > > > > + || (mask2 == OPTION_MASK_ISA2_AVXVNNI) > > > > > > > > I don't think we gain anything with unified builtins. Better, just > > > > introduce separate builtins, e.g for > > > > > > > > > > Unified builtins are used for unified intrinsics, intrinsics users may > > > prefer > > > same interface and let compiler decide encoding version. Separate > > > buitins may cause > > > some defination ambiguous when target attribute is used, see avx-vnni-2.c. > > > We also provide separate intrinsics interface for compatibility with > > > different compilers(llvm/msvc/icc). > > > > Hm, the new intrinsics file introduces: > > > > +#ifdef __AVXVNNI__ > > +#define _mm256_dpbusd_avx_epi32(A, B, C) \ > > + _mm256_dpbusd_epi32((A), (B), (C)) > > ... > > +#endif /* __AVXVNNI__ */ > > + > > +#define _mm256_dpbusd_epi32(A, B, C) \ > > + ((__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) (A), \ > > + (__v8si) (B), \ > > + (__v8si) (C))) > > + > > > > And there are two versions of intrinsics: > > > > _mm256_dpbusd_avx_epi32 > > _mm256_dpbusd_epi32 > > > > So, is _mm256_dpusb_epi32 active for either > > > > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL > > > > or > > > > OPTION_MASK_ISA2_AVXVNNI ? > > > Yes. > > Is _mm265_dpbusb_avx_epi32 the "compatibility intrinsics"? > > > > In case the above is correct, please expand the comment > > > > + /* Support unified builtin. */ > > + || (mask2 == OPTION_MASK_ISA2_AVXVNNI) > > > > with the above information, what kind of unified builtin is this. > > > > To explain this, I want to introduce some background first. the first > version of avx-vnni intrinsics are like below(same as you mentioned in > the former email) > extern __inline __m256i > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_dpwssds_avx_epi32(__m256i __A,__m256i __B,__m256i __C) > { > return (__m256i) __builtin_ia32_vpdpwssds_avx_v8si ((__v8si) __A, > (__v8si) __B, > (__v8si) __C); > } > > The implementation is normal as before. Then some intrinsics users ask > us to provide an unified intrinsics interface for the same > instruction. The request is reasonable, but the problem is avx-vnni is > independent of avx512-vnni, which means avx512-vnni doesn't imply > avx-vnni. It causes some difficulties in implementation. > If we only change patterns in sse.md(to generate vex encoding when > avx-vnni exists), it would fail gcc.target/i386/avx-vnni-2.c since > _mm_dpwssds_epi32 is define with > target_attribute("avx512vl,avx512vnni"). > if we remove target_attribute and still define _mm_dpwssds_epi32 as > "extern __inline" function, gcc would fail to build since > __builtin_ia32_vpdpwssds_v4si request target avx512vl and avx512vnni. > Finally we decided to use macros. To do this, we need to provide > unified builtins, or else there would be some issue in mapping > intrinsics to builtins. i.e: we can't do something like below, it > would still fail gcc.target/i386/avx-vnni-2.c. > > #if defined(__AVX512VNNI__) && defined(__AVX512VL__) > define _mm_dpwssds_epi32 (A, B, C)\ > ((__m128i) __builtin_ia32_vpdpwssds_v4si (A,B,C)) > #endif > > #ifdef __AVXVNNI__ > define _mm_dpwssds_epi32 (A, B, C)\ > ((__m128i) __builtin_ia32_vpdpwssds_avx_v4si (A,B,C)) > #endif > > Also we hack some code to support unified builtin. > > + if ((((bisa & (OPTION_MASK_ISA_AVX512VNNI | > OPTION_MASK_ISA_AVX512VL)) > + == (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL)) > + || (bisa2 & OPTION_MASK_ISA2_AVXVNNI) != 0) > + && (((isa & (OPTION_MASK_ISA_AVX512VNNI | > OPTION_MASK_ISA_AVX512VL)) > + == (OPTION_MASK_ISA_AVX512VNNI | > OPTION_MASK_ISA_AVX512VL)) > + || (isa2 & OPTION_MASK_ISA2_AVXVNNI) != 0)) > + { > + isa |= OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL; > + isa2 |= OPTION_MASK_ISA2_AVXVNNI; > + } > + > > This for > > > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL > > > > or > > > > OPTION_MASK_ISA2_AVXVNNI ? > > > > And this > > + /* Support unified builtin. */ > + || (mask2 == OPTION_MASK_ISA2_AVXVNNI) > > for gcc/testsuite/lib/target-supports.exp where only builtin is > available to check toolchain existence(we may use inline assembly > instead, but it expose an issue), def_builtin should define > __builtin_ia32_vpdpwssds_v4si whenever avx512-vnni && avx512vl or > avx-vnni exist, see gcc/testsuite/gcc.target/i386/avxvnni-builtin.c > > > Please also note that #defines won't be tested in e.g. sse-13.c, where: > > > > --q-- > > Defining away "extern" and "__inline" results in all of them being > > compiled as proper functions. */ > > > > #define extern > > #define __inline > > --/q-- > > > > so these defines should be reimplemented as extern inline functions. > > > > I reimplemented _mm_dpbusd_avx_epi32 as extern inline functions, but > still keep _mm_dpbusd_epi32 as macros. Maybe we should also add some > explicit O0 tests for _mm_dpbusd_epi32.
Thanks, this and the explanation above clears my concerns. + /* Support unified builtin. */ + || (mask2 == OPTION_MASK_ISA2_AVXVNNI) Please expand the above comment. We don't know what unified builtin is. LGTM with the above change. Thanks, Uros. > > Uros. > > > > -- > BR, > Hongtao