Re: [PATCH] move x86 to use gather/scatter internal functions

Hongtao Liu via Gcc-patches Wed, 18 Aug 2021 04:25:15 -0700

On Wed, Aug 18, 2021 at 6:28 PM Richard Biener <[email protected]> wrote:
>
> On Wed, 18 Aug 2021, Richard Biener wrote:
>
> >
> > So in the end I seem to be able to combine AVX & AVX512 arriving
> > at the following which passes basic testing.  I will now see to
> > teach the vectorizer the required "promotion" to handle
> > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
> >
> > Meanwhile, do you see any hole in the below?  If not I'll
> > do mask_scatter_store accordingly (that will be simpler since
> > AVX doesn't have scatter).
>
> There seems to be one more complication ... we have
>
> (define_expand "avx2_gatherdi<mode>"
>   [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
>                    (unspec:VEC_GATHER_MODE
>                      [(match_operand:<VEC_GATHER_SRCDI> 1
> "register_operand")
>                       (mem:<ssescalarmode>
>                         (match_par_dup 6
>                           [(match_operand 2 "vsib_address_operand")
>                            (match_operand:<VEC_GATHER_IDXDI>
>                               3 "register_operand")
>
> but VEC_GATHER_IDXDI is
>
> (define_mode_attr VEC_GATHER_IDXDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>                        (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
>                        (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
>
> I'd have expected (V4SF "V4DI") for example, or (V8SF "V8DI").
VEX.128 version: For dword indices, the instruction will gather four
single-precision floating-point values. For
qword indices, the instruction will gather two values and zero the
upper 64 bits of the destination.
VEX.256 version: For dword indices, the instruction will gather eight
single-precision floating-point values. For
qword indices, the instruction will gather four values and zero the
upper 128 bits of the destination.


So, for expander name, it should be v2sfv2di and v4sfv4di for IDXDI
under avx2, and v8sfv8di under avx512.

----cut pattern----
(define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
(unspec:VEC_GATHER_MODE
  [(pc)
   (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
     [(unspec:P
[(match_operand:P 2 "vsib_address_operand" "Tv")
(match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
(match_operand:SI 5 "const1248_operand" "n")]
UNSPEC_VSIBADDR)])
   (mem:BLK (scratch))
   (match_operand:<VEC_GATHER_SRCDI> 4 "register_operand" "1")]
  UNSPEC_GATHER))
   (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))]
  "TARGET_AVX2"
{
  if (<VEC_GATHER_MODE:MODE>mode != <VEC_GATHER_SRCDI>mode)
    return "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6,
%x0|%x0, %6, %4}";
----cut end---------------
We are using the trick of the operand modifier %x0 to force print xmm.



--------------cut from i386-expand.c-----------
case IX86_BUILTIN_GATHER3DIV16SI:
  if (target == NULL_RTX)
    target = gen_reg_rtx (V8SImode);
  emit_insn (gen_vec_extract_lo_v16si (target, subtarget));
  break;
case IX86_BUILTIN_GATHER3DIV8SF:
case IX86_BUILTIN_GATHERDIV8SF:
  if (target == NULL_RTX)
    target = gen_reg_rtx (V4SFmode);
  emit_insn (gen_vec_extract_lo_v8sf (target, subtarget));
-----------cut from i386-expand.c-----------

and do ugly things when expand builtin

> From the ISA docs I conclude that vgatherqps is not supported for
> a %zmm destination but V8SF and V8DI is a supported combination?
>
> (define_mode_attr VEC_GATHER_IDXSI
>                       [(V2DI "V4SI") (V4DI "V4SI") (V8DI "V8SI")
>                        (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
>                        (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
>                        (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
>
> looks like what I expect.  So shouldn't we use a special
> VEC_GATHERDI_MODE iterator instead that drops V16{SI,SF} and adjust
> VEC_GATHER_IDXDI accordingly?  Even
>
> (define_expand "<avx512>_gatherdi<mode>"
>   [(parallel [(set (match_operand:VI48F 0 "register_operand")
>                    (unspec:VI48F
>
> "supports" V16SI and V16SF gathers, leading to broken patterns?
>
> I can test if anything goes wrong fixing VEC_GATHER_IDXDI to
>
> (define_mode_attr VEC_GATHER_IDXDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>                        (V4SI "V4DI") (V8SI "V8DI") (V16SI "V8DI")
>                        (V4SF "V4DI") (V8SF "V8DI") (V16SF "V8DI")])
>
> Thanks,
> Richard.
>
> > Thanks,
> > Richard.
> >
> > From 706d3ac96b384eaad249cc83ec542ec643e21a4c Mon Sep 17 00:00:00 2001
> > From: Richard Biener <[email protected]>
> > Date: Tue, 17 Aug 2021 15:14:38 +0200
> > Subject: [PATCH] move x86 to use gather/scatter internal functions
> > To: [email protected]
> >
> > This moves the x86 backend to use standard pattern names for
> > mask_gather_load and mask_scatter_store rather than using the
> > builtin_{gather,scatter} target hooks.
> >
> > The patch starts with mask_gather_load convering AVX and AVX512
> > and omits gather_load which doesn't match an actual instruction.
> > The vectorizer will appropriately set up a all-ones mask when
> > necessary.
> >
> > The vectorizer still lacks support for unpacking, thus
> > mixed size gathers like mask_gather_loadv4dfv4si.
> >
> > 2021-08-17  Richard Biener  <[email protected]>
> >
> >       * config/i386/sse.md
> >       (vec_gather_idxsi, vec_gather_idxdi): New lower-case
> >       mode attributes for VEC_GATHER_IDX{SI,DI}.
> >       (VEC_GATHER_MODEX): New mode iterator.
> >       (mask_gather_load<mode><vec_gather_idxsi>): New expander.
> >       (mask_gather_load<mode><vec_gather_idxdi>): Likewise.
> > ---
> >  gcc/config/i386/sse.md | 53 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 53 insertions(+)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 13889687793..6ae826c268a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -23640,12 +23640,22 @@
> >                      (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
> >                      (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
> >                      (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> > +(define_mode_attr vec_gather_idxsi
> > +                   [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si")
> > +                    (V2DF "v4si") (V4DF "v4si") (V8DF "v8si")
> > +                    (V4SI "v4si") (V8SI "v8si") (V16SI "v16si")
> > +                    (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")])
> >
> >  (define_mode_attr VEC_GATHER_IDXDI
> >                     [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> >                      (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
> >                      (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
> >                      (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> > +(define_mode_attr vec_gather_idxdi
> > +                   [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di")
> > +                    (V2DF "v2di") (V4DF "v4di") (V8DF "v8di")
> > +                    (V4SI "v2di") (V8SI "v4di") (V16SI "v8di")
> > +                    (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")])
> >
> >  (define_mode_attr VEC_GATHER_SRCDI
> >                     [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> > @@ -23653,6 +23663,30 @@
> >                      (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
> >                      (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
> >
> > +(define_mode_iterator VEC_GATHER_MODEX
> > +  [V2DI V2DF V4DI V4DF V4SI V4SF V8SI V8SF
> > +   (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> > +   (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
> > +
> > +(define_expand "mask_gather_load<mode><vec_gather_idxsi>"
> > +  [(match_operand:VEC_GATHER_MODEX 0 "register_operand")
> > +   (match_operand 1 "vsib_address_operand")
> > +   (match_operand:<VEC_GATHER_IDXSI> 2 "register_operand")
> > +   (match_operand:SI 3 "const0_operand")
> > +   (match_operand:SI 4 "const1248_operand")
> > +   (match_operand 5 "register_operand")]
> > +  "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > +  rtx scratch = gen_reg_rtx (<MODE>mode);
> > +  emit_move_insn (scratch, CONST0_RTX(<MODE>mode));
> > +  if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL)
> > +    operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]);
> > +  emit_insn (gen_<avx2_avx512>_gathersi<mode> (operands[0], scratch,
> > +                                            operands[1], operands[2],
> > +                                            operands[5], operands[4]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "avx2_gathersi<mode>"
> >    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> >                  (unspec:VEC_GATHER_MODE
> > @@ -23714,6 +23748,25 @@
> >     (set_attr "prefix" "vex")
> >     (set_attr "mode" "<sseinsnmode>")])
> >
> > +(define_expand "mask_gather_load<mode><vec_gather_idxdi>"
> > +  [(match_operand:VEC_GATHER_MODEX 0 "register_operand")
> > +   (match_operand 1 "vsib_address_operand")
> > +   (match_operand:<VEC_GATHER_IDXDI> 2 "register_operand")
> > +   (match_operand:SI 3 "const0_operand")
> > +   (match_operand:SI 4 "const1248_operand")
> > +   (match_operand 5 "register_operand")]
> > +  "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > +  rtx scratch = gen_reg_rtx (<MODE>mode);
> > +  emit_move_insn (scratch, CONST0_RTX(<MODE>mode));
> > +  if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL)
> > +    operands[5] = gen_lowpart_SUBREG (<VEC_GATHER_SRCDI>mode, operands[5]);
> > +  emit_insn (gen_<avx2_avx512>_gatherdi<mode> (operands[0], scratch,
> > +                                            operands[1], operands[2],
> > +                                            operands[5], operands[4]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "avx2_gatherdi<mode>"
> >    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> >                  (unspec:VEC_GATHER_MODE
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



-- 
BR,
Hongtao

Re: [PATCH] move x86 to use gather/scatter internal functions

Reply via email to