On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
> Hi,
>
> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
> is upper bound for (<number of loop iterations> - 1) and therefore simple
> dividing it by VF in many cases gives us bounds greater than a real one.
> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).

Yeah, that seems correct.

> Also decrement due to peeling for gaps should happen before we scale it
> by VF because peeling applies to a scalar loop, not vectorized one.

That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
is peeled as scalar operations.  We do not account for the amount
of known prologue peeling (if peeling for alignment and the misalignment
is known at compile-time) - that would be peeling of scalar iterations.

But it would be interesting to know why we need the != 0 check - static
cost modelling should have disabled vectorization if the vectorized body
isn't run.

> This patch modifies nb_iterations_upper_bound computation to resolve
> these issues.

You do not adjust the ->nb_iterations_estimate accordingly.

> Running regression testing I got one fail due to optimized loop. Heres
> is a loop:
>
> foo (signed char s)
> {
>   signed char i;
>   for (i = 0; i < s; i++)
>     yy[i] = (signed int) i;
> }
>
> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
> iterations and therefore vectorized loop may be executed only once.
> With the patch applied compiler detects it and transforms loop into
> BB with just stores of constants vectors into yy.  Test was adjusted
> to increase number of possible iterations.  A copy of test was added
> to check we can optimize out the original loop.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?

I'd like to see testcases covering the corner-cases - have them have
upper bound estimates by adjusting known array sizes and also cover
the case of peeling for gaps.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-04-21  Ilya Enkovich  <ilya.enkov...@intel.com>
>
>         * tree-vect-loop.c (vect_transform_loop): Fix
>         nb_iterations_upper_bound computation for vectorized loop.
>
> gcc/testsuite/
>
> 2016-04-21  Ilya Enkovich  <ilya.enkov...@intel.com>
>
>         * gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
>         optimization of vector loop.
>         * gcc.target/i386/vect-unpack-3.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> index 4825248..51c518e 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> @@ -6,19 +6,22 @@
>
>  #define N 120
>  signed int yy[10000];
> +signed char zz[10000];
>
>  void
> -__attribute__ ((noinline)) foo (signed char s)
> +__attribute__ ((noinline,noclone)) foo (int s)
>  {
> -   signed char i;
> +   int i;
>     for (i = 0; i < s; i++)
> -     yy[i] = (signed int) i;
> +     yy[i] = zz[i];
>  }
>
>  void
>  avx512bw_test ()
>  {
>    signed char i;
> +  for (i = 0; i < N; i++)
> +    zz[i] = i;
>    foo (N);
>    for (i = 0; i < N; i++)
>      if ( (signed int)i != yy [i] )
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> new file mode 100644
> index 0000000..eb8a93e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math 
> -mavx512bw -save-temps" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check.h"
> +
> +#define N 120
> +signed int yy[10000];
> +
> +void
> +__attribute__ ((noinline)) foo (signed char s)
> +{
> +   signed char i;
> +   for (i = 0; i < s; i++)
> +     yy[i] = (signed int) i;
> +}
> +
> +void
> +avx512bw_test ()
> +{
> +  signed char i;
> +  foo (N);
> +  for (i = 0; i < N; i++)
> +    if ( (signed int)i != yy [i] )
> +      abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index d813b86..da98211 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>    /* Reduce loop iterations by the vectorization factor.  */
>    scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
>                       expected_iterations / vectorization_factor);
> -  loop->nb_iterations_upper_bound
> -    = wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor);
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>        && loop->nb_iterations_upper_bound != 0)
>      loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
> +  loop->nb_iterations_upper_bound
> +    = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
> +                     vectorization_factor) - 1;
> +
>    if (loop->any_estimate)
>      {
>        loop->nb_iterations_estimate

Reply via email to