Richard Biener <rguent...@suse.de> writes:
> Targets recently got the ability to request the vector mode to be
> used for a vector epilogue (or the epilogue of a vector epilogue).  The
> following adds the ability for it to indicate the epilogue should use
> loop masking, irrespective of the --param vect-partial-vector-usage
> setting.

Is overriding the --param a good idea?  I can imagine we'd eventually
want a --param to override the override of the --param. :)

> The simple prototype below uses a separate flag from the epilogue
> mode, but I wonder how we want to more generally want to handle
> whether to use masking or not when iterating over modes.  Currently
> we mostly rely on --param vect-partial-vector-usage.  aarch64
> and riscv have both variable-length modes but also fixed-size modes
> where for the latter, like on x86, the target couldn't request
> a mode specifically with or without masking.  It seems both
> aarch64 and riscv fully rely on cost comparison and fully
> exploiting the mode iteration space (but not masked vs. non-masked?!)
> here?
>
> I was thinking of adding a vectorization_mode class that would
> encapsulate the mode and whether to allow masking or alternatively
> to make the vector_modes array (and the m_suggested_epilogue_mode)
> a std::pair of mode and mask flag?

Predicated vs. non-predicated SVE is interesting for the main loop.
The class sounds like it would be useful for that.

I suppose predicated vs. non-predicated SVE is also potentially
interesting for an unrolled epilogue, although there, it would in
theory be better to predicate only the last vector iteration
(i.e. part predicated, part unpredicated).

So I suppose unpredicated SVE epilogue loops might be interesting
until that partial predication is implemented, but I'm not sure how
useful unpredicated SVE epilogue loops would be "once" the partial
predication is supported.

I don't imagine we'll often know a priori for AArch64 which type
of vector epilogue is best.  Since switching between SVE and
Advanced SIMD is assumed to be essentially free, I think we'll
still rely on the current approach of costing both and seeing
which is cheaper.

Thanks,
Richard

>
> For the x86 case going the prototype way would be sufficient, we
> wouldn't want to say use a masked AVX epilogue for a AVX512 loop,
> so any further iteration on epilogue modes if the requested mode
> would fail to vectorize is OK to be unmasked.
>
> Any comments on this?  You are not yet using m_suggested_epilogue_mode
> to get more than one vector epilogue, this might be a way to add
> heuristics when to use a masked epilogue.
>
> Thanks,
> Richard.
>
>       * tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
>       Add masked output parameter and return m_masked_epilogue.
>       (vector_costs::m_masked_epilogue): New tristate flag.
>       (vector_costs::vector_costs): Initialize m_masked_epilogue.
>       * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
>       flag to optionally initialize can_use_partial_vectors_p.
>       (vect_analyze_loop): For epilogues also get whether to use
>       a masked epilogue for this loop from the target and use
>       that for the first epilogue mode we try.
> ---
>  gcc/tree-vect-loop.cc | 29 +++++++++++++++++++++--------
>  gcc/tree-vectorizer.h | 12 +++++++++---
>  2 files changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 2d1a6883e6b..4af510ff20c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3407,6 +3407,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
> *shared,
>                    const vect_loop_form_info *loop_form_info,
>                    loop_vec_info orig_loop_vinfo,
>                    const vector_modes &vector_modes, unsigned &mode_i,
> +                  int masked_p,
>                    machine_mode &autodetected_vector_mode,
>                    bool &fatal)
>  {
> @@ -3415,6 +3416,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
> *shared,
>  
>    machine_mode vector_mode = vector_modes[mode_i];
>    loop_vinfo->vector_mode = vector_mode;
> +  if (masked_p != -1)
> +    loop_vinfo->can_use_partial_vectors_p = masked_p;
>    unsigned int suggested_unroll_factor = 1;
>    unsigned slp_done_for_suggested_uf = 0;
>  
> @@ -3580,7 +3583,7 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>        cached_vf_per_mode[last_mode_i] = -1;
>        opt_loop_vec_info loop_vinfo
>       = vect_analyze_loop_1 (loop, shared, &loop_form_info,
> -                            NULL, vector_modes, mode_i,
> +                            NULL, vector_modes, mode_i, -1,
>                              autodetected_vector_mode, fatal);
>        if (fatal)
>       break;
> @@ -3665,19 +3668,24 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>       array may contain length-agnostic and length-specific modes.  Their
>       ordering is not guaranteed, so we could end up picking a mode for the 
> main
>       loop that is after the epilogue's optimal mode.  */
> +  int masked_p = -1;
>    if (!unlimited_cost_model (loop)
> -      && first_loop_vinfo->vector_costs->suggested_epilogue_mode () != 
> VOIDmode)
> +      && (first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p)
> +       != VOIDmode))
>      {
>        vector_modes[0]
> -     = first_loop_vinfo->vector_costs->suggested_epilogue_mode ();
> +     = first_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p);
>        cached_vf_per_mode[0] = 0;
>      }
>    else
>      vector_modes[0] = autodetected_vector_mode;
>    mode_i = 0;
>  
> -  bool supports_partial_vectors =
> -    partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
> +  /* ???  If we'd compute LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P 
> unconditionally
> +     for the main loop we could re-use that as short-cut here.  Not if
> +     the suggested epilogue mode is different, of course.  */
> +  bool supports_partial_vectors = partial_vectors_supported_p () && 
> ((param_vect_partial_vector_usage != 0 && masked_p != 0)
> +                                || masked_p == 1);
>    poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
>  
>    loop_vec_info orig_loop_vinfo = first_loop_vinfo;
> @@ -3707,7 +3715,7 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>         opt_loop_vec_info loop_vinfo
>           = vect_analyze_loop_1 (loop, shared, &loop_form_info,
>                                  orig_loop_vinfo,
> -                                vector_modes, mode_i,
> +                                vector_modes, mode_i, masked_p,
>                                  autodetected_vector_mode, fatal);
>         if (fatal)
>           break;
> @@ -3738,6 +3746,10 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>               break;
>           }
>  
> +       /* ???  x86 does not have masked / unmasked modes, not sure
> +          if SVE fixed-size modes would support masking?  That said,
> +          the modes array should be changed to pair(mode, masked_p).  */
> +       masked_p = -1;
>         if (mode_i == vector_modes.length ())
>           break;
>       }
> @@ -3748,12 +3760,13 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>  
>        /* When we selected a first vectorized epilogue, see if the target
>        suggests to have another one.  */
> +      masked_p = -1;
>        if (!unlimited_cost_model (loop)
> -       && (orig_loop_vinfo->vector_costs->suggested_epilogue_mode ()
> +       && (orig_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p)
>             != VOIDmode))
>       {
>         vector_modes[0]
> -         = orig_loop_vinfo->vector_costs->suggested_epilogue_mode ();
> +         = orig_loop_vinfo->vector_costs->suggested_epilogue_mode (masked_p);
>         cached_vf_per_mode[0] = 0;
>         mode_i = 0;
>       }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 7aa2b02b63c..6022bc0d55a 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1693,7 +1693,7 @@ public:
>    unsigned int outside_cost () const;
>    unsigned int total_cost () const;
>    unsigned int suggested_unroll_factor () const;
> -  machine_mode suggested_epilogue_mode () const;
> +  machine_mode suggested_epilogue_mode (int &masked) const;
>  
>  protected:
>    unsigned int record_stmt_cost (stmt_vec_info, vect_cost_model_location,
> @@ -1717,8 +1717,12 @@ protected:
>    unsigned int m_suggested_unroll_factor;
>  
>    /* The suggested mode to be used for a vectorized epilogue or VOIDmode,
> -     determined at finish_cost.  */
> +     determined at finish_cost.  m_masked_epilogue is whether in case
> +     of non-VOIDmode epilogue mode the epilogue should use masked
> +     vectorization, regardless of --param vect-partial-vector-usage
> +     setting.  If -1 then the --param setting takes precedence.  */
>    machine_mode m_suggested_epilogue_mode;
> +  int m_masked_epilogue;
>  
>    /* True if finish_cost has been called.  */
>    bool m_finished;
> @@ -1734,6 +1738,7 @@ vector_costs::vector_costs (vec_info *vinfo, bool 
> costing_for_scalar)
>      m_costs (),
>      m_suggested_unroll_factor(1),
>      m_suggested_epilogue_mode(VOIDmode),
> +    m_masked_epilogue (-1),
>      m_finished (false)
>  {
>  }
> @@ -1794,9 +1799,10 @@ vector_costs::suggested_unroll_factor () const
>  /* Return the suggested epilogue mode.  */
>  
>  inline machine_mode
> -vector_costs::suggested_epilogue_mode () const
> +vector_costs::suggested_epilogue_mode (int &masked_p) const
>  {
>    gcc_checking_assert (m_finished);
> +  masked_p = m_masked_epilogue;
>    return m_suggested_epilogue_mode;
>  }

Reply via email to