On Tue, Aug 17, 2021 at 10:43 PM Richard Biener via Gcc-patches
<[email protected]> wrote:
>
> On Tue, Aug 17, 2021 at 3:29 PM Richard Biener via Gcc-patches
> <[email protected]> wrote:
> >
> > This is an attempt to start moving the x86 backend to use
> > standard pattern names for [mask_]gather_load and [mask_]scatter_store
> > rather than using the builtin_{gather,scatter} target hooks.
> >
> > I've started with AVX2 gathers and given x86 only supports masked
> > gather I only implemented mask_gather_load. Note while for
> > the builtin_gather case the vectorizer will provide an all-true
> > mask operand for non-masked gathers this capability does not
> > exist for the IFN path yet, so only testcases with actual masked
> > gathers will work.
> >
> > If this looks reasonable on the backend side I'll see to first
> > complete the vectorizer part, ripping out the target hook and
> > arranging for the missing pieces. Another one is the support
> > for SImode indices with DFmode data which requires unpacking
> > the index vector and actually recognizing the IFN.
> >
> > 2021-08-17 Richard Biener <[email protected]>
> >
> > * tree-vect-data-refs.c (vect_check_gather_scatter):
> > Always use internal functions.
> > * config/i386/sse.md
> > (mask_gather_load<mode><vec_gather_idxsi>): New expander.
> > (mask_gather_load<mode><vec_gather_idxdi>): Likewise.
> > ---
> > gcc/config/i386/sse.md | 56 +++++++++++++++++++++++++++++++++++++++
> > gcc/tree-vect-data-refs.c | 4 +--
> > 2 files changed, 57 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 3957c86c3df..40bec98d9f7 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -23232,12 +23232,22 @@
> > (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
> > (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
> > (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> > +(define_mode_attr vec_gather_idxsi
> > + [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si")
> > + (V2DF "v4si") (V4DF "v4si") (V8DF "v8si")
> > + (V4SI "v4si") (V8SI "v8si") (V16SI "v16si")
> > + (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")])
> >
> > (define_mode_attr VEC_GATHER_IDXDI
> > [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> > (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
> > (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
> > (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> > +(define_mode_attr vec_gather_idxdi
> > + [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di")
> > + (V2DF "v2di") (V4DF "v4di") (V8DF "v8di")
> > + (V4SI "v2di") (V8SI "v4di") (V16SI "v8di")
> > + (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")])
> >
> > (define_mode_attr VEC_GATHER_SRCDI
> > [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> > @@ -23245,6 +23255,29 @@
> > (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
> > (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
> >
> > +(define_expand "mask_gather_load<mode><vec_gather_idxsi>"
> > + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> > + (unspec:VEC_GATHER_MODE
> > + [(pc)
> > + (mem:<ssescalarmode>
> > + (match_par_dup 6
> > + [(match_operand 1 "vsib_address_operand")
> > + (match_operand:<VEC_GATHER_IDXSI>
> > + 2 "register_operand")
> > + (match_operand:SI 4 "const1248_operand")
> > + (match_operand:SI 3 "const0_operand")]))
> > + (mem:BLK (scratch))
> > + (match_operand:<sseintvecmode> 5 "register_operand")]
>
> One problem of these is that when AVX512[VL] is enabled we get a AVX512 mask
> mode here and while the internal function expansion check succeeds (it
> never checks
> the mask operand!), RTL expansion fails unexpectedly because of this mismatch.
>
‘mask_gather_loadmn’
Like ‘gather_loadmn’, but takes an extra mask operand as operand 5. Bit i of
the mask is set if element i of the result should be loaded from memory and
clear if element i of the result should be set to zero.
According to the document, the mask here needs to be an avx512 mask,
would it be ok to use a vector mask?
> I suppose a more complicated define_mode_attr might do the trick or do I
> need to add && !TARGET_AVX512F to these expanders?
>
> I've meanwhile posted a patch to make the vectorizer fall back to
> masked_ when non-masked_ variants are not available and that seems to work
> fine at least
‘gather_loadmn’
Load several separate memory locations into a vector of mode m. Operand 1
is a scalar base address and operand 2 is a vector of mode n containing offsets
from that base. Operand 0 is a destination vector with the same number of
elements as n. For each element index i:
Also it seems operand 0 should have the same number of elements as
operand 2, but vgatherdq has v2di as operand 0 and v4si as operand2.
Considering we already have
"gather_load<mode><VEC_GATHER_IDXSI_lower>", either it's not checked
in the middle end for the same element number or the existing expander
is not used.
BTW, you can define mask_gather_load<mode><vec_gather_idxsi> like
below and emit_insn (avx2_gathersi<mode> directly.
(define_expand "mask_gather_load<mode><VEC_GATHER_IDXSI_lower>"
[(match_operand:VEC_GATHER_MODE 0 "register_operand")
(match_operand 1 "vsib_address_operand")
(match_operand:<VEC_GATHER_IDXSI> 2 "register_operand")
(match_operand:SI 3 "const0_operand")
(match_operand:SI 4 "const1248_operand")
(match_operand:<sseintvecmode> 5 "register_operand")
]
"TARGET_AVX2 && TARGET_USE_GATHER"
{
operands[5] = lowpart_subreg (<VEC_GATHER_MODE>mode, operands[5]);
/* According to the document, src here needs to be const0_rtx. */
emit_insn (avx2_gathersi<mode> (operands[0],
CONST0_RTX<VEC_GATHER_MODE>mode,operands[1], operands[2], operands[5],
operands[4]);
DONE;
})
> Richard.
>
> > + UNSPEC_GATHER))
> > + (clobber (match_scratch:VEC_GATHER_MODE 7))])]
> > + "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > + operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]);
> > + operands[6]
> > + = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3],
> > + operands[5]), UNSPEC_VSIBADDR);
> > +})
> > +
> > (define_expand "avx2_gathersi<mode>"
> > [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> > (unspec:VEC_GATHER_MODE
> > @@ -23306,6 +23339,29 @@
> > (set_attr "prefix" "vex")
> > (set_attr "mode" "<sseinsnmode>")])
> >
> > +(define_expand "mask_gather_load<mode><vec_gather_idxdi>"
> > + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> > + (unspec:VEC_GATHER_MODE
> > + [(pc)
> > + (mem:<ssescalarmode>
> > + (match_par_dup 6
> > + [(match_operand 1 "vsib_address_operand")
> > + (match_operand:<VEC_GATHER_IDXDI>
> > + 2 "register_operand")
> > + (match_operand:SI 4 "const1248_operand ")
> > + (match_operand:SI 3 "const0_operand")]))
> > + (mem:BLK (scratch))
> > + (match_operand:<sseintvecmode> 5 "register_operand")]
> > + UNSPEC_GATHER))
> > + (clobber (match_scratch:VEC_GATHER_MODE 7))])]
> > + "TARGET_AVX2 && TARGET_USE_GATHER"
> > +{
> > + operands[5] = gen_lowpart_SUBREG (<VEC_GATHER_SRCDI>mode, operands[5]);
> > + operands[6]
> > + = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[1], operands[2],
> > + operands[4]), UNSPEC_VSIBADDR);
> > +})
> > +
> > (define_expand "avx2_gatherdi<mode>"
> > [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> > (unspec:VEC_GATHER_MODE
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index d594c0a1b1e..67564b45c32 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -3824,9 +3824,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> >
> > /* True if we should aim to use internal functions rather than
> > built-in functions. */
> > - bool use_ifn_p = (DR_IS_READ (dr)
> > - ? supports_vec_gather_load_p ()
> > - : supports_vec_scatter_store_p ());
> > + bool use_ifn_p = true;
> >
> > base = DR_REF (dr);
> > /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF,
> > --
> > 2.31.1
--
BR,
Hongtao