On Tue, Aug 17, 2021 at 3:29 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This is an attempt to start moving the x86 backend to use
> standard pattern names for [mask_]gather_load and [mask_]scatter_store
> rather than using the builtin_{gather,scatter} target hooks.
>
> I've started with AVX2 gathers and given x86 only supports masked
> gather I only implemented mask_gather_load.  Note while for
> the builtin_gather case the vectorizer will provide an all-true
> mask operand for non-masked gathers this capability does not
> exist for the IFN path yet, so only testcases with actual masked
> gathers will work.
>
> If this looks reasonable on the backend side I'll see to first
> complete the vectorizer part, ripping out the target hook and
> arranging for the missing pieces.  Another one is the support
> for SImode indices with DFmode data which requires unpacking
> the index vector and actually recognizing the IFN.
>
> 2021-08-17  Richard Biener  <rguent...@suse.de>
>
>         * tree-vect-data-refs.c (vect_check_gather_scatter):
>         Always use internal functions.
>         * config/i386/sse.md
>         (mask_gather_load<mode><vec_gather_idxsi>): New expander.
>         (mask_gather_load<mode><vec_gather_idxdi>): Likewise.
> ---
>  gcc/config/i386/sse.md    | 56 +++++++++++++++++++++++++++++++++++++++
>  gcc/tree-vect-data-refs.c |  4 +--
>  2 files changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3957c86c3df..40bec98d9f7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -23232,12 +23232,22 @@
>                        (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI")
>                        (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI")
>                        (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")])
> +(define_mode_attr vec_gather_idxsi
> +                     [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si")
> +                      (V2DF "v4si") (V4DF "v4si") (V8DF "v8si")
> +                      (V4SI "v4si") (V8SI "v8si") (V16SI "v16si")
> +                      (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")])
>
>  (define_mode_attr VEC_GATHER_IDXDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>                        (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
>                        (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> +(define_mode_attr vec_gather_idxdi
> +                     [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di")
> +                      (V2DF "v2di") (V4DF "v4di") (V8DF "v8di")
> +                      (V4SI "v2di") (V8SI "v4di") (V16SI "v8di")
> +                      (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")])
>
>  (define_mode_attr VEC_GATHER_SRCDI
>                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> @@ -23245,6 +23255,29 @@
>                        (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
>                        (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
>
> +(define_expand "mask_gather_load<mode><vec_gather_idxsi>"
> +  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> +                  (unspec:VEC_GATHER_MODE
> +                    [(pc)
> +                     (mem:<ssescalarmode>
> +                       (match_par_dup 6
> +                         [(match_operand 1 "vsib_address_operand")
> +                          (match_operand:<VEC_GATHER_IDXSI>
> +                             2 "register_operand")
> +                          (match_operand:SI 4 "const1248_operand")
> +                          (match_operand:SI 3 "const0_operand")]))
> +                     (mem:BLK (scratch))
> +                     (match_operand:<sseintvecmode> 5 "register_operand")]

One problem of these is that when AVX512[VL] is enabled we get a AVX512 mask
mode here and while the internal function expansion check succeeds (it
never checks
the mask operand!), RTL expansion fails unexpectedly because of this mismatch.

I suppose a more complicated define_mode_attr might do the trick or do I
need to add && !TARGET_AVX512F to these expanders?

I've meanwhile posted a patch to make the vectorizer fall back to
masked_ when non-masked_ variants are not available and that seems to work
fine at least.

Richard.

> +                    UNSPEC_GATHER))
> +             (clobber (match_scratch:VEC_GATHER_MODE 7))])]
> +  "TARGET_AVX2 && TARGET_USE_GATHER"
> +{
> +  operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]);
> +  operands[6]
> +    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3],
> +                                       operands[5]), UNSPEC_VSIBADDR);
> +})
> +
>  (define_expand "avx2_gathersi<mode>"
>    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
>                    (unspec:VEC_GATHER_MODE
> @@ -23306,6 +23339,29 @@
>     (set_attr "prefix" "vex")
>     (set_attr "mode" "<sseinsnmode>")])
>
> +(define_expand "mask_gather_load<mode><vec_gather_idxdi>"
> +  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
> +                  (unspec:VEC_GATHER_MODE
> +                    [(pc)
> +                     (mem:<ssescalarmode>
> +                       (match_par_dup 6
> +                         [(match_operand 1 "vsib_address_operand")
> +                          (match_operand:<VEC_GATHER_IDXDI>
> +                             2 "register_operand")
> +                          (match_operand:SI 4 "const1248_operand ")
> +                          (match_operand:SI 3 "const0_operand")]))
> +                     (mem:BLK (scratch))
> +                     (match_operand:<sseintvecmode> 5 "register_operand")]
> +                    UNSPEC_GATHER))
> +             (clobber (match_scratch:VEC_GATHER_MODE 7))])]
> +  "TARGET_AVX2 && TARGET_USE_GATHER"
> +{
> +  operands[5] = gen_lowpart_SUBREG (<VEC_GATHER_SRCDI>mode, operands[5]);
> +  operands[6]
> +    = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[1], operands[2],
> +                                       operands[4]), UNSPEC_VSIBADDR);
> +})
> +
>  (define_expand "avx2_gatherdi<mode>"
>    [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
>                    (unspec:VEC_GATHER_MODE
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index d594c0a1b1e..67564b45c32 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -3824,9 +3824,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>
>    /* True if we should aim to use internal functions rather than
>       built-in functions.  */
> -  bool use_ifn_p = (DR_IS_READ (dr)
> -                   ? supports_vec_gather_load_p ()
> -                   : supports_vec_scatter_store_p ());
> +  bool use_ifn_p = true;
>
>    base = DR_REF (dr);
>    /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF,
> --
> 2.31.1

Reply via email to