On Wed, 18 Aug 2021, Hongtao Liu wrote:

> On Wed, Aug 18, 2021 at 5:54 PM Richard Biener <rguent...@suse.de> wrote:
> >
> >
> > So in the end I seem to be able to combine AVX & AVX512 arriving
> > at the following which passes basic testing.  I will now see to
> > teach the vectorizer the required "promotion" to handle
> > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
> >
> > Meanwhile, do you see any hole in the below?  If not I'll
> > do mask_scatter_store accordingly (that will be simpler since
> > AVX doesn't have scatter).
> >
> > Thanks,
> > Richard.
> >
> > From 706d3ac96b384eaad249cc83ec542ec643e21a4c Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguent...@suse.de>
> > Date: Tue, 17 Aug 2021 15:14:38 +0200
> > Subject: [PATCH] move x86 to use gather/scatter internal functions
> > To: gcc-patches@gcc.gnu.org
> >
> > This moves the x86 backend to use standard pattern names for
> > mask_gather_load and mask_scatter_store rather than using the
> > builtin_{gather,scatter} target hooks.
> >
> > The patch starts with mask_gather_load convering AVX and AVX512
> > and omits gather_load which doesn't match an actual instruction.
> > The vectorizer will appropriately set up a all-ones mask when
> > necessary.
> >
> > The vectorizer still lacks support for unpacking, thus
> > mixed size gathers like mask_gather_loadv4dfv4si.
> >
> > 2021-08-17  Richard Biener  <rguent...@suse.de>
> >
> >         * config/i386/sse.md
> >         (vec_gather_idxsi, vec_gather_idxdi): New lower-case
> >         mode attributes for VEC_GATHER_IDX{SI,DI}.
> >         (VEC_GATHER_MODEX): New mode iterator.
> >         (mask_gather_load<mode><vec_gather_idxsi>): New expander.
> >         (mask_gather_load<mode><vec_gather_idxdi>): Likewise.
> > ---
> >  gcc/config/i386/sse.md | 53 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 53 insertions(+)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 13889687793..6ae826c268a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -23640,12 +23640,22 @@
> >                        (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
> >                        (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
> >                        (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> > +(define_mode_attr vec_gather_idxsi
> > +                     [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si")
> > +                      (V2DF "v4si") (V4DF "v4si") (V8DF "v8si")
> > +                      (V4SI "v4si") (V8SI "v8si") (V16SI "v16si")
> > +                      (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")])
> >
> >  (define_mode_attr VEC_GATHER_IDXDI
> >                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> >                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
> >                        (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
> >                        (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> > +(define_mode_attr vec_gather_idxdi
> > +                     [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di")
> > +                      (V2DF "v2di") (V4DF "v4di") (V8DF "v8di")
> > +                      (V4SI "v2di") (V8SI "v4di") (V16SI "v8di")
> > +                      (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")])
> >
> >  (define_mode_attr VEC_GATHER_SRCDI
> >                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> > @@ -23653,6 +23663,30 @@
> >                        (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
> >                        (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
> >
> > +(define_mode_iterator VEC_GATHER_MODEX
> > +  [V2DI V2DF V4DI V4DF V4SI V4SF V8SI V8SF
> > +   (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> > +   (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
> > +
> > +(define_expand "mask_gather_load<mode><vec_gather_idxsi>"
> > +  [(match_operand:VEC_GATHER_MODEX 0 "register_operand")
> > +   (match_operand 1 "vsib_address_operand")
> > +   (match_operand:<VEC_GATHER_IDXSI> 2 "register_operand")
> > +   (match_operand:SI 3 "const0_operand")
> > +   (match_operand:SI 4 "const1248_operand")
> > +   (match_operand 5 "register_operand")]
> > +  "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > +  rtx scratch = gen_reg_rtx (<MODE>mode);
> > +  emit_move_insn (scratch, CONST0_RTX(<MODE>mode));
> > +  if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL)
> No need for 64-bytes mode since avx512 mask will be used there.
> And shouldn't we also handle 32-byte vector float modes?

Whoops, yes - that should have been GET_MODE_SIZE (<MODE>mode) != 64.
The condition is supposed to fend off all AVX mask modes here.

> > +    operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]);
> > +  emit_insn (gen_<avx2_avx512>_gathersi<mode> (operands[0], scratch,
> I guess you want to use <avx2_avx512> automatically match
> avx2_gathersi<mode> and avx512vl_gathersi<mode>, but <avx2_avx512>
> will only match avx2 for 256/128-bits mode.
> 
> (define_mode_attr avx2_avx512
>   [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f")
>    (V2DI "avx2") (V4DI "avx2") (V8DI "avx512f")
>    (V4SF "avx2") (V8SF "avx2") (V16SF "avx512f")
>    (V2DF "avx2") (V4DF "avx2") (V8DF "avx512f")
>    (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")])

Hmm yeah.  I merged it too much I guess.  I suppose

  if (TARGET_AVX512VL)
    gen_<avx512>_gathersi<mode> (...);
  else
    gen_<avx2_avx512>_gathersi<mode> (...);

should do the trick.  I'll do these adjustments and will re-test.

Richard.

Reply via email to