On Tue, Nov 10, 2020 at 9:08 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Mon, Nov 9, 2020 at 8:26 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Mon, Nov 9, 2020 at 11:31 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > >
> > > > +      /* Support unified builtin.  */
> > > > +      || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
> > > >
> > > > I don't think we gain anything with unified builtins. Better, just
> > > > introduce separate builtins, e.g for
> > > >
> > >
> > > Unified builtins are used for unified intrinsics, intrinsics users may 
> > > prefer
> > > same interface and let compiler decide encoding version. Separate
> > > buitins may cause
> > > some defination ambiguous when target attribute is used, see avx-vnni-2.c.
> > > We also provide separate intrinsics interface for compatibility with
> > > different compilers(llvm/msvc/icc).
> >
> > Hm, the new intrinsics file introduces:
> >
> > +#ifdef __AVXVNNI__
> > +#define _mm256_dpbusd_avx_epi32(A, B, C) \
> > +  _mm256_dpbusd_epi32((A), (B), (C))
> > ...
> > +#endif /* __AVXVNNI__ */
> > +
> > +#define _mm256_dpbusd_epi32(A, B, C)                \
> > +  ((__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) (A),    \
> > +                       (__v8si) (B),    \
> > +                       (__v8si) (C)))
> > +
> >
> > And there are two versions of intrinsics:
> >
> > _mm256_dpbusd_avx_epi32
> > _mm256_dpbusd_epi32
> >
> > So, is _mm256_dpusb_epi32 active for either
> >
> > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
> >
> > or
> >
> > OPTION_MASK_ISA2_AVXVNNI ?
> >
> Yes.
> > Is _mm265_dpbusb_avx_epi32 the "compatibility intrinsics"?
> >
> > In case the above is correct, please expand the comment
> >
> > +      /* Support unified builtin.  */
> > +      || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
> >
> > with the above information, what kind of unified builtin is this.
> >
>
> To explain this, I want to introduce some background first. the first
> version of avx-vnni intrinsics are like below(same as you mentioned in
> the former email)
> extern __inline __m256i
> __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> _mm256_dpwssds_avx_epi32(__m256i __A,__m256i __B,__m256i __C)
> {
>   return (__m256i) __builtin_ia32_vpdpwssds_avx_v8si ((__v8si) __A,
>                                                   (__v8si) __B,
>                                                   (__v8si) __C);
> }
>
> The implementation is normal as before. Then some intrinsics users ask
> us to provide an unified intrinsics interface for the same
> instruction. The request is reasonable, but the problem is avx-vnni is
> independent of avx512-vnni, which means avx512-vnni doesn't imply
> avx-vnni. It causes some difficulties in implementation.
> If we only change patterns in sse.md(to generate vex encoding when
> avx-vnni exists), it would fail gcc.target/i386/avx-vnni-2.c since
> _mm_dpwssds_epi32 is define with
> target_attribute("avx512vl,avx512vnni").
> if we remove target_attribute and still define _mm_dpwssds_epi32 as
> "extern __inline" function, gcc would fail to build since
> __builtin_ia32_vpdpwssds_v4si request target avx512vl and avx512vnni.
> Finally we decided to use macros. To do this, we need to provide
> unified builtins, or else there would be some issue in mapping
> intrinsics to builtins. i.e: we can't do something like below, it
> would still fail gcc.target/i386/avx-vnni-2.c.
>
> #if defined(__AVX512VNNI__) && defined(__AVX512VL__)
> define _mm_dpwssds_epi32 (A, B, C)\
>  ((__m128i) __builtin_ia32_vpdpwssds_v4si (A,B,C))
> #endif
>
> #ifdef __AVXVNNI__
> define _mm_dpwssds_epi32 (A, B, C)\
>  ((__m128i) __builtin_ia32_vpdpwssds_avx_v4si (A,B,C))
> #endif
>
> Also we hack some code to support unified builtin.
>
> +  if ((((bisa & (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +        == (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> +       || (bisa2 & OPTION_MASK_ISA2_AVXVNNI) != 0)
> +      && (((isa & (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +           == (OPTION_MASK_ISA_AVX512VNNI |
> OPTION_MASK_ISA_AVX512VL))
> +          || (isa2 & OPTION_MASK_ISA2_AVXVNNI) != 0))
> +    {
> +      isa |= OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL;
> +      isa2 |= OPTION_MASK_ISA2_AVXVNNI;
> +    }
> +
>
> This for
>
> > OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL
> >
> > or
> >
> > OPTION_MASK_ISA2_AVXVNNI ?
> >
>
> And this
>
> +          /* Support unified builtin.  */
> +          || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
>
> for gcc/testsuite/lib/target-supports.exp where only builtin is
> available to check toolchain existence(we may use inline assembly
> instead, but it expose an issue),  def_builtin should define
> __builtin_ia32_vpdpwssds_v4si  whenever avx512-vnni && avx512vl or
> avx-vnni exist, see gcc/testsuite/gcc.target/i386/avxvnni-builtin.c
>
> > Please also note that #defines won't be tested in e.g. sse-13.c, where:
> >
> > --q--
> >   Defining away "extern" and "__inline" results in all of them being
> >   compiled as proper functions.  */
> >
> > #define extern
> > #define __inline
> > --/q--
> >
> > so these defines should be reimplemented as extern inline functions.
> >
>
> I reimplemented _mm_dpbusd_avx_epi32 as extern inline functions, but
> still keep _mm_dpbusd_epi32 as macros. Maybe we should also add some
> explicit O0 tests for _mm_dpbusd_epi32.

Thanks, this and the explanation above clears my concerns.

+      /* Support unified builtin.  */
+      || (mask2 == OPTION_MASK_ISA2_AVXVNNI)

Please expand the above comment. We don't know what unified builtin is.

LGTM with the above change.

Thanks,
Uros.
> > Uros.
>
>
>
> --
> BR,
> Hongtao

Reply via email to