On Fri, Jul 21, 2023 at 8:08 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi,
>
> The function vect_update_epilogue_niters which has been
> removed by r14-2281 has some code taking care of that if
> there is only one scalar iteration left for epilogue then
> we won't try to vectorize it any more.
>
> Although costing should be able to care about it eventually,
> I think we still want this special casing without costing
> enabled, so this patch is to add it back in function
> vect_analyze_loop_costing, and make it more general for
> both main and epilogue loops as Richi suggested, it can fix
> some exposed failures on Power10:
>
>  - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
>  - gcc.dg/vect/slp-perm-{1,5,6,7}.c
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -----
>         PR tree-optimization/110740
>
> gcc/ChangeLog:
>
>         * tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a
>         loop with a single scalar iteration.
> ---
>  gcc/tree-vect-loop.cc | 55 ++++++++++++++++++++++++++-----------------
>  1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..92d2abde094 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>       epilogue we can also decide whether the main loop leaves us
>       with enough iterations, prefering a smaller vector epilog then
>       also possibly used for the case we skip the vector loop.  */
> -  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> -      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>      {
>        widest_int scalar_niters
>         = wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1;
> @@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>                                % lowest_vf + gap);
>             }
>         }
> -
> -      /* Check that the loop processes at least one full vector.  */
> -      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -      if (known_lt (scalar_niters, vf))
> +      /* Reject vectorizing for a single scalar iteration, even if
> +        we could in principle implement that using partial vectors.  */
> +      unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
> +      if (scalar_niters <= peeling_gap + 1)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "loop does not have enough iterations "
> -                            "to support vectorization.\n");
> +                            "not vectorized: loop only has a single "
> +                            "scalar iteration.\n");
>           return 0;
>         }
>
> -      /* If we need to peel an extra epilogue iteration to handle data
> -        accesses with gaps, check that there are enough scalar iterations
> -        available.
> -
> -        The check above is redundant with this one when peeling for gaps,
> -        but the distinction is useful for diagnostics.  */
> -      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -         && known_le (scalar_niters, vf))
> +      if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>         {
> -         if (dump_enabled_p ())
> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "loop does not have enough iterations "
> -                            "to support peeling for gaps.\n");
> -         return 0;
> +         /* Check that the loop processes at least one full vector.  */
> +         poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +         if (known_lt (scalar_niters, vf))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "loop does not have enough iterations "
> +                                "to support vectorization.\n");
> +             return 0;
> +           }
> +
> +         /* If we need to peel an extra epilogue iteration to handle data
> +            accesses with gaps, check that there are enough scalar iterations
> +            available.
> +
> +            The check above is redundant with this one when peeling for gaps,
> +            but the distinction is useful for diagnostics.  */
> +         if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +             && known_le (scalar_niters, vf))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "loop does not have enough iterations "
> +                                "to support peeling for gaps.\n");
> +             return 0;
> +           }
>         }
>      }
>
> --
> 2.39.3

Reply via email to