Thanks for looking at this.

"Kewen.Lin" <li...@linux.ibm.com> writes:
> Hi,
>
> The commit r11-3230 brings a nice improvement to use full
> vectors instead of partial vectors when available.  But
> it caused some vector with length test cases to fail on
> Power.
>
> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
> exposed one issue that: we call function 
> vect_need_peeling_or_partial_vectors_p in function
> vect_analyze_loop_2, since it's in analysis phase, for
> the epilogue loop, we could get the wrong information like
> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
> answer for vect_need_peeling_or_partial_vectors_p.
> For the epilogue loop in this failure specific, the niter
> that we get is 58 (should be 1), vf is 2.
>
> For epilogue loop with partial vectors, it would use the
> same VF as the main loop, so it won't be able to use full
> vector, this patch is to exclude epilogue loop for the
> check vect_need_peeling_or_partial_vectors_p in
> vect_analyze_loop_2.

Hmm.  For better or worse, I think the analysis phase is actually
operating on the assumption that the vector code needs to handle
all scalar iterations, even in the epilogue case.  We actually
rely on that to implement VECT_COMPARE_COSTS (although it wasn't
the original motivation for handling epilogues this way).

Perhaps we should expand the functionality (and name)
of determine_peel_for_niter so that it also computes
LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
that flag once we know the final epilogue scalar iteration count,
just like we recompute LOOP_VINFO_PEELING_FOR_NITER.

As a sanity check, I think it would also be good for the
renamed function to do the following parts of vect_analyze_loop_2:

  /* If epilog loop is required because of data accesses with gaps,
     one additional iteration needs to be peeled.  Check if there is
     enough iterations for vectorization.  */
  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
    {
      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
      tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);

      if (known_lt (wi::to_widest (scalar_niters), vf))
        return opt_result::failure_at (vect_location,
                                       "loop has no enough iterations to"
                                       " support peeling for gaps.\n");
    }

  /* If we're vectorizing an epilogue loop, the vectorized loop either needs
     to be able to handle fewer than VF scalars, or needs to have a lower VF
     than the main loop.  */
  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
      && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
                   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
    return opt_result::failure_at (vect_location,
                                   "Vectorization factor too high for"
                                   " epilogue loop.\n");

and then assert that no failure occurs when called for epilogues
from vect_do_peeling.  So the function would be doing three things:

- compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
  in vect_analyze_loop_2

- perform the checks quoted above

- what the function currently does

That's all speculation though -- I haven't tried any of this.

Andrea, how should we handle this?  Is it something you'd have time to
look at?

Thanks,
Richard

>
> The failure on gcc.target/powerpc/p9-vec-length-full-6.c
> is just a test issue, the 64bit/32bit pairs are able to
> use full vector, fixed in the patch accordingly.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>
> Is it OK for trunk?
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>       * tree-vect-loop.c (vect_analyze_loop_2): Don't check
>       vect_need_peeling_or_partial_vectors_p for the epilogue loop.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c 
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> index cfae9bbc927..5d2357aabfa 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> @@ -9,8 +9,7 @@
>  #include "p9-vec-length-6.h"
>  
>  /* It can use normal vector load for constant vector load.  */
> -/* { dg-final { scan-assembler-not   {\mstxv\M} } } */
> -/* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
> -/* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
> +/* 64bit/32bit pairs won't use partial vectors.  */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index ab627fbf029..7273e998a99 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2278,7 +2278,8 @@ start_over:
>      {
>        /* Don't use partial vectors if we don't need to peel the loop.  */
>        if (param_vect_partial_vector_usage == 0
> -       || !vect_need_peeling_or_partial_vectors_p (loop_vinfo))
> +       || (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +           && !vect_need_peeling_or_partial_vectors_p (loop_vinfo)))
>       LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
>        else if (vect_verify_full_masking (loop_vinfo)
>              || vect_verify_loop_lens (loop_vinfo))

Reply via email to